-
Notifications
You must be signed in to change notification settings - Fork 91
fix(openai-responses): emit reasoning blocks from responses stream #1156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughAdds Responses API reasoning support: SSE parsing of reasoning deltas/summaries into ThinkingBlock(s) (preserving encrypted_content), request wiring for reasoning and text.verbosity, CLI buffering/UI visibility changes, multiple tests validating parsing, request payloads, and streaming behavior. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Provider as OpenAIResponsesProvider
participant ResponsesAPI as OpenAI Responses API
participant Parser as parseResponsesStream
participant History as Content/History
participant UI
Client->>Provider: generateChatCompletion(with reasoning.*, text.verbosity)
Provider->>ResponsesAPI: POST /responses (includes reasoning, include, verbosity)
ResponsesAPI-->>Parser: SSE events (reasoning_text.delta, reasoning_summary_text.delta, output_text.delta, output_item.*)
Parser->>Parser: accumulate reasoning deltas, dedupe, emit ThinkingBlock(s)
Parser->>Provider: yield IContent items (ThinkingBlock, TextBlock, ToolCall, usage)
Provider->>History: persist content (preserve encrypted_content)
UI->>History: read item (checks reasoning.includeInResponse)
alt includeInResponse = true
UI->>UI: render ThinkingBlock(s) + Text
else
UI->>UI: render Text only (thinking stored/hidden)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
LLxprt PR Review – PR #1156Title: fix(openai-responses): emit reasoning blocks from responses stream Issue AlignmentEvidence: The PR directly addresses issue #922's root cause.
Verdict: [OK] Aligned Side Effects
Verdict: [OK] Contained Code QualityStrengths:
Observations:
Verdict: [OK] Sound Tests and CoverageTest files added (1688 lines total):
Coverage impact: [OK] Increase Tests cover:
Verdict: [OK] Substantial coverage VerdictReady The PR correctly implements reasoning/thinking block support for OpenAI Responses API by:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@packages/core/src/providers/openai/parseResponsesStream.ts`:
- Line 58: The code currently only handles "response.reasoning_text.delta" and
"response.reasoning_text.done" but misses
"response.reasoning_summary_text.delta"; update the event handling inside
parseResponsesStream (where reasoningText is declared and where events 95-117
are processed) to treat "response.reasoning_summary_text.delta" the same as
"response.reasoning_text.delta" by appending its payload to the existing
reasoningText buffer, and ensure any corresponding "done" handling merges or
finalizes reasoningText as done; reference the reasoningText variable and the
response event switch/if blocks to add the new branch for
"response.reasoning_summary_text.delta" so all reasoning variants are captured.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
packages/core/src/providers/openai/parseResponsesStream.reasoning.test.tspackages/core/src/providers/openai/parseResponsesStream.ts
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-18T14:06:22.557Z
Learning: OpenAIResponsesProvider (packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts) currently bypasses the ephemeral truncation system by using direct `JSON.stringify(toolResponseBlock.result)` and needs to be updated to support ephemeral settings like the other providers.
📚 Learning: 2025-12-18T14:06:22.557Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-18T14:06:22.557Z
Learning: OpenAIResponsesProvider (packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts) currently bypasses the ephemeral truncation system by using direct `JSON.stringify(toolResponseBlock.result)` and needs to be updated to support ephemeral settings like the other providers.
Applied to files:
packages/core/src/providers/openai/parseResponsesStream.tspackages/core/src/providers/openai/parseResponsesStream.reasoning.test.ts
📚 Learning: 2025-11-16T22:51:26.374Z
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
Applied to files:
packages/core/src/providers/openai/parseResponsesStream.tspackages/core/src/providers/openai/parseResponsesStream.reasoning.test.ts
🧬 Code graph analysis (1)
packages/core/src/providers/openai/parseResponsesStream.reasoning.test.ts (1)
packages/core/src/providers/openai/parseResponsesStream.ts (1)
parseResponsesStream(51-237)
⏰ Context from checks skipped due to timeout of 270000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test (ubuntu-latest, 24.x)
- GitHub Check: Test (macos-latest, 24.x)
- GitHub Check: E2E Test (Linux) - sandbox:none
- GitHub Check: E2E Test (Linux) - sandbox:docker
- GitHub Check: E2E Test (macOS)
- GitHub Check: Slow E2E - Win
🔇 Additional comments (4)
packages/core/src/providers/openai/parseResponsesStream.ts (2)
2-4: Doc update matches new behavior.Clear summary of the added reasoning/thinking handling.
187-202: Reasoning flush before usage looks correct.Emitting the thinking block prior to usage metadata aligns with the stated ordering requirement.
packages/core/src/providers/openai/parseResponsesStream.reasoning.test.ts (2)
4-17: SSE stream helper is clean and deterministic.The helper makes chunked SSE tests readable and reliable.
20-196: Test coverage is thorough.Covers reasoning-only, interleaving with text/tool calls, whitespace suppression, accumulation, usage metadata, and ordering.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-24.x-ubuntu-latest' artifact from the main CI run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@packages/core/src/providers/openai/parseResponsesStream.ts`:
- Around line 96-120: The code merges both response.reasoning_text.delta and
response.reasoning_summary_text.delta into a single buffer (reasoningText)
causing raw reasoning and summarized reasoning to be combined; update the
handler in parseResponsesStream (the switch cases for
'response.reasoning_text.delta', 'response.reasoning_summary_text.delta',
'response.reasoning_text.done', and 'response.reasoning_summary_text.done') to
maintain two separate accumulators (e.g., reasoningText and
reasoningSummaryText) and on their respective *.done events yield distinct
thinking blocks using the appropriate buffer (or event.text fallback) before
clearing each buffer.
🧹 Nitpick comments (1)
packages/core/src/providers/openai/parseResponsesStream.reasoning.test.ts (1)
20-219: Good test coverage overall.The test suite comprehensively covers the acceptance criteria from issue
#922:
- Reasoning-only streams ✓
- Interleaved reasoning + text + tool calls ✓
- Empty/whitespace reasoning chunks ✓
- Reasoning emitted before usage metadata ✓
Consider adding a test for
response.done(Codex variant) to explicitly verify parity withresponse.completed, though they share the same code path.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
packages/core/src/providers/openai/parseResponsesStream.reasoning.test.tspackages/core/src/providers/openai/parseResponsesStream.ts
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-18T14:06:22.557Z
Learning: OpenAIResponsesProvider (packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts) currently bypasses the ephemeral truncation system by using direct `JSON.stringify(toolResponseBlock.result)` and needs to be updated to support ephemeral settings like the other providers.
📚 Learning: 2025-12-18T14:06:22.557Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-18T14:06:22.557Z
Learning: OpenAIResponsesProvider (packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts) currently bypasses the ephemeral truncation system by using direct `JSON.stringify(toolResponseBlock.result)` and needs to be updated to support ephemeral settings like the other providers.
Applied to files:
packages/core/src/providers/openai/parseResponsesStream.tspackages/core/src/providers/openai/parseResponsesStream.reasoning.test.ts
📚 Learning: 2025-11-16T22:51:26.374Z
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
Applied to files:
packages/core/src/providers/openai/parseResponsesStream.tspackages/core/src/providers/openai/parseResponsesStream.reasoning.test.ts
🧬 Code graph analysis (1)
packages/core/src/providers/openai/parseResponsesStream.reasoning.test.ts (1)
packages/core/src/providers/openai/parseResponsesStream.ts (1)
parseResponsesStream(52-240)
⏰ Context from checks skipped due to timeout of 270000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test (macos-latest, 24.x)
- GitHub Check: Test (windows-latest, 24.x)
- GitHub Check: Test (ubuntu-latest, 24.x)
- GitHub Check: E2E Test (Linux) - sandbox:docker
- GitHub Check: E2E Test (Linux) - sandbox:none
- GitHub Check: Slow E2E - Win
- GitHub Check: E2E Test (macOS)
🔇 Additional comments (10)
packages/core/src/providers/openai/parseResponsesStream.ts (2)
104-120: LGTM!The done event handling correctly:
- Uses
event.textas an override when provided (final complete text from API), falling back to accumulatedreasoningText- Applies
.trim()check to avoid emitting empty/whitespace-only thinking blocks- Properly resets the accumulator after yielding
190-205: LGTM!Good defensive handling that flushes any pending reasoning content before emitting usage metadata. This correctly handles edge cases where reasoning delta events arrive but no explicit
.doneevent follows before stream completion.packages/core/src/providers/openai/parseResponsesStream.reasoning.test.ts (8)
4-18: LGTM!Clean and reusable test helper for simulating SSE streams. The pull-based approach correctly mimics how chunks would arrive from a real network stream.
21-47: LGTM!Comprehensive test for the basic reasoning flow. Good assertions on the accumulated thought content and sourceField.
49-88: LGTM!Good integration test covering the interleaving scenario with reasoning, text output, and tool calls. The assertions verify all three content types are correctly parsed.
90-112: LGTM!Good edge case coverage for empty/whitespace-only reasoning. The JSON-escaped
\\n\\tcorrectly becomes actual whitespace characters after parsing.
114-141: LGTM!Good test verifying that multiple reasoning deltas are correctly accumulated into a single thinking block rather than emitting multiple blocks.
143-172: LGTM!Good integration test covering the full flow with reasoning, text, and usage metadata. Also verifies the
cachedTokensdefault behavior.
174-196: LGTM!Good test coverage for
reasoning_summary_textevent type, including verification that thetextfield from the done event is correctly used.
198-219: LGTM!Critical test verifying the flush behavior when reasoning deltas arrive without an explicit done event before stream completion. This ensures the PR objective of emitting reasoning before usage metadata is met.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
…xt per CodeRabbit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@packages/cli/src/runtime/runtimeSettings.reasoningSummary.test.ts`:
- Around line 18-27: The test checking that PROFILE_EPHEMERAL_KEYS includes all
reasoning.* keys is missing the 'reasoning.verbosity' key; update the test in
runtimeSettings.reasoningSummary.test.ts to add an expectation that
PROFILE_EPHEMERAL_KEYS contains 'reasoning.verbosity' (i.e., add
expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.verbosity'); alongside the
other reasoning.* assertions) so the PROFILE_EPHEMERAL_KEYS coverage remains
complete.
In `@packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts`:
- Around line 790-795: In OpenAIResponsesProvider.ts update the block that
assigns request.tool_choice when responsesTools exist so it does not
unconditionally overwrite a user-provided value: check whether
request.tool_choice is already set (or non-empty) before assigning 'auto' (leave
user-specified values like 'required' or a function name intact), and continue
to set request.parallel_tool_calls = true as before; reference the local
variable request and responsesTools to locate the change and mirror the
conditional logic used in buildResponsesRequest.ts.
In `@packages/core/src/providers/openai/parseResponsesStream.ts`:
- Around line 196-246: The reasoning blocks can be emitted twice via
response.reasoning_text.done and response.output_item.done paths; add a Set
(e.g., emittedReasoningIds) scoped to this stream/parser to deduplicate by
event.item.id or event.item.sequence_number: when handling
response.output_item.done (the branch using
event.item.summary/event.item.content and variables
reasoningText/reasoningSummaryText) check if the item's id or sequence_number is
already in emittedReasoningIds before assembling/yielding the thinking block,
and on yielding add that id/sequence_number to the set; also consult the same
set in the response.reasoning_text.done path (which yields from accumulated
reasoningText) so it doesn't re-emit content for an id that was already emitted,
and ensure the fallback resets still occur but do not clear the deduplication
Set.
In `@packages/core/src/services/history/IContent.ts`:
- Around line 192-193: ContentValidation currently only treats the `thought`
field as content and will mark blocks with only `encryptedContent` as empty;
update the validation logic in ContentValidation to consider `encryptedContent`
(the IContent.encryptedContent property) as valid content as well so that
thinking blocks carrying only encryptedContent are not dropped during
processing. Locate the ContentValidation function/validator that references
`thought` and add a check that treats non-empty `encryptedContent` as
content-equivalent, ensuring any branches or emptiness checks use both `thought`
and `encryptedContent` when deciding to keep or drop a block.
In `@packages/vscode-ide-companion/NOTICES.txt`:
- Around line 36-40: The NOTICES.txt entry for `@hono/node-server`@1.19.9
currently shows "License text not found."; replace that placeholder in the
`@hono/node-server`@1.19.9 block with the full MIT license text exactly as
provided (the standard MIT permission grant, conditions, and disclaimer),
ensuring the complete license block replaces the missing text for the
`@hono/node-server`@1.19.9 entry so the package record no longer shows "License
text not found."
🧹 Nitpick comments (3)
packages/core/src/providers/openai/parseResponsesStream.ts (1)
95-112: Consider removing unusedlastLoggedTypetracking.The
lastLoggedTypevariable is assigned on lines 98-100 but never used for any conditional logic. If it was intended for deduplicating log output, the condition should actually use it to skip logging.♻️ Suggested cleanup
Either remove the unused variable:
- // SSE event visibility for debugging reasoning support. - // We log to stderr directly so it shows up in debug logs even if - // Track last logged type to avoid duplicate logs - if (event.type !== lastLoggedType) { - lastLoggedType = event.type; - }Or use it to actually deduplicate logs:
if (event.type !== lastLoggedType) { lastLoggedType = event.type; + logger.debug(() => `New event type: ${event.type}`); } - - // Debug: Log ALL events with full details - logger.debug(() => `SSE event: type=${event.type}, ...`);packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (2)
593-607: Potential ID collision withDate.now()for reasoning items.Using
Date.now()to generate reasoning item IDs could produce duplicates if multiple thinking blocks are processed within the same millisecond. Consider using a more robust ID generation approach.♻️ Suggested fix
+ let reasoningIdCounter = 0; // Add reasoning items if they have encrypted_content and reasoning should be included if (includeReasoningInContext) { for (const thinkingBlock of thinkingBlocks) { if (thinkingBlock.encryptedContent) { input.push({ type: 'reasoning', - id: `reasoning_${Date.now()}`, + id: `reasoning_${Date.now()}_${reasoningIdCounter++}`, summary: [ { type: 'summary_text', text: thinkingBlock.thought }, ], encrypted_content: thinkingBlock.encryptedContent, }); } } }Or use a random suffix similar to
generateSyntheticCallId().
762-770: Reasoning items are added then immediately removed in Codex mode.In Codex mode, reasoning items are added to
input(lines 594-607) and then immediately filtered out (lines 762-764). Consider skipping the reasoning item addition when in Codex mode to avoid unnecessary work.♻️ Suggested optimization
Move the Codex mode check earlier:
// Add reasoning items if they have encrypted_content and reasoning should be included - if (includeReasoningInContext) { + // Skip reasoning items for Codex mode - they get filtered out later anyway + const baseURLForCheck = options.resolved.baseURL ?? this.getBaseURL(); + const isCodexMode = this.isCodexMode(baseURLForCheck); + if (includeReasoningInContext && !isCodexMode) { for (const thinkingBlock of thinkingBlocks) {Or simply document the current behavior if there's a reason for it.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (17)
packages/cli/src/providers/aliases/codex.configpackages/cli/src/providers/providerAliases.codex.reasoningSummary.test.tspackages/cli/src/runtime/runtimeSettings.reasoningSummary.test.tspackages/cli/src/runtime/runtimeSettings.tspackages/cli/src/settings/ephemeralSettings.reasoningSummary.test.tspackages/cli/src/settings/ephemeralSettings.reasoningVerbosity.test.tspackages/cli/src/settings/ephemeralSettings.tspackages/cli/src/ui/commands/setCommand.test.tspackages/cli/src/ui/commands/setCommand.tspackages/core/src/providers/openai-responses/OpenAIResponsesProvider.tspackages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningInclude.test.tspackages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningSummary.test.tspackages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.textVerbosity.test.tspackages/core/src/providers/openai/openaiRequestParams.tspackages/core/src/providers/openai/parseResponsesStream.tspackages/core/src/services/history/IContent.tspackages/vscode-ide-companion/NOTICES.txt
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
📚 Learning: 2025-11-16T22:51:26.374Z
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
Applied to files:
packages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningSummary.test.tspackages/core/src/providers/openai/parseResponsesStream.tspackages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts
📚 Learning: 2026-01-13T19:28:00.789Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2026-01-13T19:28:00.789Z
Learning: In the llxprt-code codebase (`packages/core/src/providers/anthropic/AnthropicProvider.ts`), Anthropic's API returns `contentBlock.input` as an already-parsed JavaScript object, not a JSON string. The code was incorrectly calling `JSON.stringify(contentBlock.input)` before passing it to `processToolParameters()`, which was designed for OpenAI-style string parameters. This causes arrays and other complex types to be corrupted into strings (e.g., `paths` array becomes a string `"[\"**/*.toml\"]"` instead of actual array). The fix is to use `contentBlock.input` directly without stringifying for Anthropic provider.
Applied to files:
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts
🧬 Code graph analysis (7)
packages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningSummary.test.ts (2)
packages/core/src/index.ts (3)
setActiveProviderRuntimeContext(276-276)createProviderRuntimeContext(274-274)clearActiveProviderRuntimeContext(277-277)packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (1)
OpenAIResponsesProvider(70-1015)
packages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.textVerbosity.test.ts (1)
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (1)
OpenAIResponsesProvider(70-1015)
packages/cli/src/providers/providerAliases.codex.reasoningSummary.test.ts (1)
packages/cli/src/providers/providerAliases.ts (1)
loadProviderAliasEntries(115-150)
packages/cli/src/settings/ephemeralSettings.reasoningSummary.test.ts (1)
packages/cli/src/settings/ephemeralSettings.ts (2)
ephemeralSettingHelp(9-107)isValidEphemeralSetting(556-562)
packages/cli/src/settings/ephemeralSettings.reasoningVerbosity.test.ts (1)
packages/cli/src/settings/ephemeralSettings.ts (2)
parseEphemeralSettingValue(127-525)ephemeralSettingHelp(9-107)
packages/cli/src/runtime/runtimeSettings.reasoningSummary.test.ts (1)
packages/cli/src/runtime/runtimeSettings.ts (1)
PROFILE_EPHEMERAL_KEYS(895-942)
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (1)
packages/core/src/services/history/IContent.ts (1)
ThinkingBlock(177-194)
🔇 Additional comments (30)
packages/core/src/providers/openai/openaiRequestParams.ts (1)
81-84: LGTM!The filter correctly prevents the internal sentinel value
'none'from being sent to the API. The implementation follows the existing pattern and the comment clearly documents the intent.packages/vscode-ide-companion/NOTICES.txt (3)
10-10: Version bump looks fine; verify notice matches upstream license.
1019-1019: qs version update OK; confirm license block matches the new version.
2272-2333: Added json-schema-typed notice looks complete; verify against upstream.packages/cli/src/ui/commands/setCommand.test.ts (1)
110-118: Updated key list looks correct.packages/cli/src/providers/aliases/codex.config (1)
8-11: Reasoning defaults look good for codex.packages/cli/src/runtime/runtimeSettings.ts (1)
925-932: Key exposure update looks correct.packages/cli/src/ui/commands/setCommand.ts (1)
315-335: New reasoning options are wired cleanly into /set.packages/cli/src/settings/ephemeralSettings.reasoningSummary.test.ts (1)
1-70: Good coverage for reasoning.summary help + validation.packages/cli/src/providers/providerAliases.codex.reasoningSummary.test.ts (1)
19-45: LGTM — solid coverage of codex.config reasoning defaults.packages/cli/src/settings/ephemeralSettings.ts (1)
443-562: LGTM — validation and helper align with existing patterns.packages/cli/src/settings/ephemeralSettings.reasoningVerbosity.test.ts (1)
13-70: LGTM — good coverage for reasoning.verbosity validation and help text.packages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningSummary.test.ts (1)
42-366: LGTM — request payload assertions look solid across summary modes.packages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.textVerbosity.test.ts (1)
44-352: LGTM — thorough coverage for text.verbosity behavior.packages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningInclude.test.ts (8)
1-56: LGTM! Well-structured test setup.The test file has proper license headers, clear documentation of the test plan, and correctly manages mock state with
beforeEach/afterEachhooks. The runtime context setup is appropriate for isolated testing.
58-118: LGTM! Good test coverage for reasoning.enabled flag.The test correctly verifies that setting
reasoning.enabled=trueresults in theincludeparameter being added to the request body.
120-183: LGTM! Validates alternative trigger for include parameter.Good coverage ensuring
includeis added when onlyreasoning.effortis set withoutreasoning.enabled.
185-246: LGTM! Important negative test case.Correctly validates that
includeis not added when reasoning is not explicitly requested.
248-318: LGTM! Validates internal field stripping.Correctly ensures that internal settings (
enabled,includeInContext,includeInResponse) are stripped from API requests while preserving valid API fields likeeffort.
321-399: LGTM! Comprehensive SSE parsing test.Good coverage of parsing
response.output_item.doneevents with reasoning type, verifying both the thinking block content and encrypted content preservation for round-trip.
401-472: LGTM! Tests delta accumulation correctly.Validates that
response.reasoning_summary_text.deltaevents are properly accumulated and yielded as a thinking block.
475-663: LGTM! Good coverage of context inclusion behavior.Both tests correctly validate the
reasoning.includeInContextsetting - including encrypted content when true and stripping reasoning items when false.packages/core/src/providers/openai/parseResponsesStream.ts (4)
10-12: LGTM! Good addition of debug logging.The DebugLogger initialization with a specific namespace helps with debugging reasoning support issues.
14-50: LGTM! Interface extensions match API schema.The extended
ResponsesEventinterface properly accommodates the reasoning-related fields from the OpenAI Responses API.
126-172: LGTM! Separate buffer handling addresses previous review feedback.The implementation correctly maintains separate accumulators for
reasoning_textandreasoning_summary_textas recommended in past reviews, with proper buffer reset after yielding.
292-320: LGTM! Defensive flush of reasoning buffers on completion.Good defensive measure to ensure any accumulated reasoning content is emitted before usage metadata, even if the stream ends without explicit done events.
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (4)
29-35: LGTM! Proper import of ThinkingBlock type.The
ThinkingBlockimport is correctly added to support reasoning block handling in the provider.
541-556: LGTM! ResponsesInputItem type correctly extended.The reasoning variant matches the OpenAI Responses API expected format with
type,id,summaryarray, andencrypted_content.
797-859: LGTM! Comprehensive reasoning settings handling.The implementation correctly:
- Adds
includeparameter when reasoning is enabled or effort is set- Adds
reasoning.summarywhen configured- Validates and adds
text.verbositywith proper lowercase normalization- Includes appropriate debug logging throughout
351-367: LGTM! Function signature updated to include reasoning type.The type union correctly includes the new reasoning variant to match
ResponsesInputItem.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| it('should include all reasoning.* keys in PROFILE_EPHEMERAL_KEYS', () => { | ||
| // Verify all reasoning settings are saveable | ||
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.enabled'); | ||
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.includeInContext'); | ||
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.includeInResponse'); | ||
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.format'); | ||
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.stripFromContext'); | ||
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.effort'); | ||
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.maxTokens'); | ||
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.summary'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add reasoning.verbosity to the “all reasoning. keys” assertion.*
The test promises full reasoning coverage but omits reasoning.verbosity, so a regression could slip through unnoticed.
✅ Proposed fix
expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.maxTokens');
expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.summary');
+ expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.verbosity');📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it('should include all reasoning.* keys in PROFILE_EPHEMERAL_KEYS', () => { | |
| // Verify all reasoning settings are saveable | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.enabled'); | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.includeInContext'); | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.includeInResponse'); | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.format'); | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.stripFromContext'); | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.effort'); | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.maxTokens'); | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.summary'); | |
| it('should include all reasoning.* keys in PROFILE_EPHEMERAL_KEYS', () => { | |
| // Verify all reasoning settings are saveable | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.enabled'); | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.includeInContext'); | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.includeInResponse'); | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.format'); | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.stripFromContext'); | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.effort'); | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.maxTokens'); | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.summary'); | |
| expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.verbosity'); |
🤖 Prompt for AI Agents
In `@packages/cli/src/runtime/runtimeSettings.reasoningSummary.test.ts` around
lines 18 - 27, The test checking that PROFILE_EPHEMERAL_KEYS includes all
reasoning.* keys is missing the 'reasoning.verbosity' key; update the test in
runtimeSettings.reasoningSummary.test.ts to add an expectation that
PROFILE_EPHEMERAL_KEYS contains 'reasoning.verbosity' (i.e., add
expect(PROFILE_EPHEMERAL_KEYS).toContain('reasoning.verbosity'); alongside the
other reasoning.* assertions) so the PROFILE_EPHEMERAL_KEYS coverage remains
complete.
… summaries
Adds text.verbosity parameter to OpenAI Responses API requests to enable
thinking/reasoning summary output from GPT-5.x Codex models.
Changes:
- Add text.verbosity ephemeral setting (low/medium/high)
- Add reasoning.summary ephemeral setting (auto/concise/detailed/none)
- Both settings saveable via /profile save
- Add autocomplete support in /set command
- Send text: { verbosity } in request body per codex-rs implementation
- Send reasoning.summary in request body
- Add include: ['reasoning.encrypted_content'] when reasoning enabled
- Add encryptedContent field to ThinkingBlock for round-trip
- Enhanced SSE parsing with debug logging for reasoning events
- Update codex.config alias with default reasoning settings
Issue: #922
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@packages/core/src/providers/openai/parseResponsesStream.ts`:
- Around line 232-244: The yielded "thinking" block is missing the
encryptedContent field even though encryptedContent is logged earlier; inside
the generator that yields the AI thinking block (where thoughtText is used) add
encryptedContent: encryptedContent to the block object so the yielded payload
includes the preserved reasoning content (ensure you update the block in the
yield that constructs { type: 'thinking', thought: thoughtText, sourceField:
'reasoning_content' } to include encryptedContent).
♻️ Duplicate comments (4)
packages/vscode-ide-companion/NOTICES.txt (1)
36-40: Add the MIT license text for@hono/node-server.The license text for
@hono/[email protected]is still missing. This issue was previously flagged and the MIT License text was provided in earlier review comments.packages/core/src/services/history/IContent.ts (1)
177-193: Encrypted-only thinking blocks may be dropped by validation.
Content validation still keys offthoughtonly; encrypted-only reasoning can be treated as empty.packages/core/src/providers/openai/parseResponsesStream.ts (1)
196-246: Potential duplicate reasoning block emissions remain unaddressed.Per the past review, both
response.reasoning_text.done(lines 140-155) andresponse.output_item.donewithtype=reasoning(lines 201-244) can emit thinking blocks for the same reasoning content. The buffer resets (lines 218-225) only prevent duplication when the fallback path is used, but ifitem.summaryoritem.contentarrays are present inoutput_item.done, a thinking block is yielded regardless of whetherreasoning_text.donealready emitted.Consider adding a
Set<string>to track emitted reasoning byitem.idto prevent yielding the same reasoning block twice.packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (1)
790-795: User-specifiedtool_choiceis still unconditionally overwritten.Per the past review, this code should check if
tool_choicewas already provided (e.g.,'required'or a specific function) before defaulting to'auto'. The current implementation overwrites any user-specified value.if (responsesTools && responsesTools.length > 0) { request.tools = responsesTools; - // Per codex-rs: always set tool_choice and parallel_tool_calls when tools are present - request.tool_choice = 'auto'; + // Per codex-rs: set tool_choice when tools are present, respecting user-specified values + if (!request.tool_choice) { + request.tool_choice = 'auto'; + } request.parallel_tool_calls = true; }
🧹 Nitpick comments (2)
packages/core/src/providers/openai/parseResponsesStream.ts (1)
95-112: Remove or use thelastLoggedTypetracking variable.The
lastLoggedTypevariable is assigned (lines 98-100) but never used for any conditional logic. Either remove it or implement the intended deduplication behavior.- // SSE event visibility for debugging reasoning support. - // We log to stderr directly so it shows up in debug logs even if - // Track last logged type to avoid duplicate logs - if (event.type !== lastLoggedType) { - lastLoggedType = event.type; - } - // Debug: Log ALL events with full detailspackages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (1)
593-607: Consider using a more unique ID for reasoning items.
Date.now()(line 599) can produce duplicate IDs if multiple thinking blocks are processed within the same millisecond. Consider using a counter or random suffix similar togenerateSyntheticCallId().+ let reasoningIdCounter = 0; for (const thinkingBlock of thinkingBlocks) { if (thinkingBlock.encryptedContent) { input.push({ type: 'reasoning', - id: `reasoning_${Date.now()}`, + id: `reasoning_${Date.now()}_${reasoningIdCounter++}`, summary: [ { type: 'summary_text', text: thinkingBlock.thought }, ], encrypted_content: thinkingBlock.encryptedContent, }); } }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (17)
packages/cli/src/providers/aliases/codex.configpackages/cli/src/providers/providerAliases.codex.reasoningSummary.test.tspackages/cli/src/runtime/runtimeSettings.reasoningSummary.test.tspackages/cli/src/runtime/runtimeSettings.tspackages/cli/src/settings/ephemeralSettings.reasoningSummary.test.tspackages/cli/src/settings/ephemeralSettings.textVerbosity.test.tspackages/cli/src/settings/ephemeralSettings.tspackages/cli/src/ui/commands/setCommand.test.tspackages/cli/src/ui/commands/setCommand.tspackages/core/src/providers/openai-responses/OpenAIResponsesProvider.tspackages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningInclude.test.tspackages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningSummary.test.tspackages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.textVerbosity.test.tspackages/core/src/providers/openai/openaiRequestParams.tspackages/core/src/providers/openai/parseResponsesStream.tspackages/core/src/services/history/IContent.tspackages/vscode-ide-companion/NOTICES.txt
🚧 Files skipped from review as they are similar to previous changes (7)
- packages/cli/src/ui/commands/setCommand.test.ts
- packages/core/src/providers/openai/openaiRequestParams.ts
- packages/core/src/providers/openai-responses/tests/OpenAIResponsesProvider.textVerbosity.test.ts
- packages/cli/src/ui/commands/setCommand.ts
- packages/cli/src/providers/aliases/codex.config
- packages/cli/src/runtime/runtimeSettings.reasoningSummary.test.ts
- packages/cli/src/settings/ephemeralSettings.reasoningSummary.test.ts
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
📚 Learning: 2025-11-16T22:51:26.374Z
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
Applied to files:
packages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningSummary.test.tspackages/core/src/providers/openai/parseResponsesStream.tspackages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts
📚 Learning: 2026-01-13T19:28:00.789Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2026-01-13T19:28:00.789Z
Learning: In the llxprt-code codebase (`packages/core/src/providers/anthropic/AnthropicProvider.ts`), Anthropic's API returns `contentBlock.input` as an already-parsed JavaScript object, not a JSON string. The code was incorrectly calling `JSON.stringify(contentBlock.input)` before passing it to `processToolParameters()`, which was designed for OpenAI-style string parameters. This causes arrays and other complex types to be corrupted into strings (e.g., `paths` array becomes a string `"[\"**/*.toml\"]"` instead of actual array). The fix is to use `contentBlock.input` directly without stringifying for Anthropic provider.
Applied to files:
packages/core/src/services/history/IContent.tspackages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts
🧬 Code graph analysis (5)
packages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningSummary.test.ts (1)
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (1)
OpenAIResponsesProvider(70-1015)
packages/cli/src/providers/providerAliases.codex.reasoningSummary.test.ts (1)
packages/cli/src/providers/providerAliases.ts (1)
loadProviderAliasEntries(115-150)
packages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningInclude.test.ts (1)
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (1)
OpenAIResponsesProvider(70-1015)
packages/cli/src/settings/ephemeralSettings.textVerbosity.test.ts (1)
packages/cli/src/settings/ephemeralSettings.ts (2)
parseEphemeralSettingValue(127-525)ephemeralSettingHelp(9-107)
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (1)
packages/core/src/services/history/IContent.ts (1)
ThinkingBlock(177-194)
🔇 Additional comments (29)
packages/vscode-ide-companion/NOTICES.txt (1)
10-10: LGTM!The dependency version updates and the new
[email protected]entry with complete BSD 2-Clause License text are properly documented.Also applies to: 1019-1019, 2272-2333
packages/cli/src/settings/ephemeralSettings.textVerbosity.test.ts (1)
13-62: LGTM — solid coverage for text.verbosity parsing and help text.packages/cli/src/providers/providerAliases.codex.reasoningSummary.test.ts (1)
19-45: LGTM — good safety net for codex alias reasoning defaults.packages/cli/src/runtime/runtimeSettings.ts (1)
895-932: LGTM — new ephemeral keys are correctly surfaced for profile snapshots.packages/cli/src/settings/ephemeralSettings.ts (3)
77-80: Help text additions look good.
443-469: Validation for reasoning.summary and text.verbosity looks correct.
556-561: Function is unused in production; concern about value coercion is theoretical but valid.
isValidEphemeralSettingis only tested, never called in production code. The stringification does lose type information (undefined→"undefined", objects →"[object Object]"), but existing tests confirm rejection of non-string types works due toparseValuere-parsing strings. However, the current design is fragile: if this function is intended as a public API, consider either:
- Removing
String()coercion and accepting only string inputs (matchingparseEphemeralSettingValue's contract)- Or explicitly documenting that typed inputs are deliberately stringified, with clear test coverage for edge cases like
undefinedand objectspackages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningSummary.test.ts (1)
23-366: LGTM — comprehensive coverage for reasoning.summary request shaping.packages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningInclude.test.ts (8)
1-56: Well-structured test setup with proper isolation.The test file has good organization with proper
beforeEach/afterEachhooks for mock cleanup and runtime context management. The use ofsetActiveProviderRuntimeContextensures proper isolation between tests.
58-118: LGTM!Test correctly verifies that
include: ["reasoning.encrypted_content"]is added whenreasoning.enabled=true. The request body capture pattern and JSON parsing are appropriate.
120-183: LGTM!Good coverage for the case where
reasoning.efforttriggers the include parameter without explicitreasoning.enabled. Verifying both the include parameter and the effort value in the request body is thorough.
185-246: LGTM!Good negative test case ensuring the include parameter isn't added when reasoning isn't requested. This prevents unnecessary API overhead.
248-318: LGTM!Critical test ensuring client-side settings (
enabled,includeInContext,includeInResponse) are stripped from the API request while preserving API-relevant fields likeeffort. This prevents API errors from unrecognized fields.
322-399: LGTM!Comprehensive test for parsing
response.output_item.doneevents with reasoning type. Good verification of both the human-readable summary text and theencryptedContentpreservation for round-trip context.
401-472: LGTM!Good test for delta accumulation from
response.reasoning_summary_text.deltaevents. Verifies both parts of the streamed reasoning are accumulated and appear in the final thinking block.
475-662: LGTM!Excellent coverage for the reasoning context round-trip behavior. Tests correctly verify that encrypted_content is included in subsequent requests when
includeInContext=trueand stripped whenfalse. This is essential for maintaining reasoning continuity across conversation turns.packages/core/src/providers/openai/parseResponsesStream.ts (5)
10-12: LGTM!Good addition of
DebugLoggerfor reasoning event tracing and separate accumulators forreasoningTextandreasoningSummaryText. This addresses the past review comment about tracking them separately.Also applies to: 67-70
14-50: LGTM!Type extensions correctly model the OpenAI Responses API reasoning event schema with
text,content_index,summary_index, and thesummary/content/encrypted_contentfields on items.
126-138: LGTM!Delta accumulation correctly uses separate buffers for
reasoning_textandreasoning_summary_textevents, addressing the past review concern about merging distinct content types.
140-172: LGTM!The
*.donehandlers correctly yield thinking blocks using either the finalevent.textor accumulated buffer, then reset the buffer. The consistent use ofsourceField: 'reasoning_content'maintains compatibility with round-trip serialization.
292-320: LGTM!Good safety net to flush any remaining accumulated reasoning before emitting usage metadata. This ensures reasoning content isn't lost if the stream ends without explicit
*.doneevents.packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (8)
29-35: LGTM!Good addition of
ThinkingBlockimport to support reasoning block handling in the provider.
541-556: LGTM!The
ResponsesInputItemtype extension correctly models the reasoning input format for the Responses API, enabling round-trip reasoning content withsummaryandencrypted_contentfields.
567-569: LGTM!Good default behavior allowing reasoning in context unless explicitly disabled. The
!== falsecheck handles undefined/null gracefully.
797-819: LGTM!The reasoning include parameter logic correctly computes
shouldRequestReasoningbased onreasoning.enabledorreasoning.effort, and adds the appropriateincludeparameter. Debug logging provides good observability.
821-835: LGTM!Good handling of
reasoning.summarywith proper validation and safe object creation. Skipping'none'values prevents unnecessary API parameters.
842-859: LGTM!Good validation of
text.verbosityagainst allowed values (low,medium,high) with case-insensitive matching. The structure aligns with the OpenAI Responses API format.
351-367: LGTM!The updated type signature correctly includes the reasoning variant, maintaining consistency with the
ResponsesInputItemunion type.
762-770: Add clarifying comment: Reasoning items are permanently excluded from Codex requests without re-addition.Lines 762-770 filter reasoning items before synthetic config injection, but they are never re-added to the final request. While reasoning is configured via the
includeandrequest.reasoningparameters, the permanent removal of reasoning items from the input array is not documented. Clarify whether this is intentional (e.g., Codex API limitation) or if reasoning items from context should be re-added after synthetic injection.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
…d deduplicate thinking blocks - Add includeThinkingInResponse option to parseResponsesStream to filter thinking blocks at parse time - Pass reasoning.includeInResponse setting from OpenAIResponsesProvider to parseResponsesStream - Fix thinking block duplication in interactive mode by: - Resetting thinkingBlocksRef at start of each stream (useGeminiStream) - Deduplicating thought events by content (useGeminiStream) - Only showing thinking blocks when item is not pending (GeminiMessage) - Add emittedThoughts Set to deduplicate across all emission paths in parseResponsesStream - Non-interactive mode already works correctly with consolidated buffer approach Fixes #922
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (1)
568-605: Reasoning item IDs can collide within a request.
Date.now()can return the same value for multiple thinking blocks in the same loop, producing duplicateids in the input array. That can cause items to overwrite or be rejected. Use a per-request counter or UUID.🔧 Proposed fix (monotonic counter)
- const includeReasoningInContext = - options.settings?.get('reasoning.includeInContext') !== false; + const includeReasoningInContext = + options.settings?.get('reasoning.includeInContext') !== false; + let reasoningCounter = 0; ... - input.push({ - type: 'reasoning', - id: `reasoning_${Date.now()}`, + input.push({ + type: 'reasoning', + id: `reasoning_${Date.now()}_${reasoningCounter++}`,
🤖 Fix all issues with AI agents
In `@packages/cli/src/nonInteractiveCli.test.ts`:
- Around line 749-847: The two it.skip test blocks are declared outside the
describe('runNonInteractive') scope so they reference out-of-scope fixtures
(mockGeminiClient, mockConfig, createStreamFromEvents, processStdoutSpy) and
cause TypeScript failures; move those it.skip blocks into the existing
describe('runNonInteractive') block (or create a new describe with the same
beforeEach/fixtures) so they can access mockGeminiClient, mockConfig,
createStreamFromEvents, and processStdoutSpy, ensuring the tests use the same
setup/teardown as the other tests.
In `@packages/cli/src/nonInteractiveCli.ts`:
- Around line 171-189: The flushThoughtBuffer currently emits raw <think> tags
when includeThinking is true even in STREAM_JSON mode; update the logic that
computes includeThinking (and/or the flushThoughtBuffer path) to also check that
streamJsonOutput is false (i.e., only emit thinking when !streamJsonOutput), or
route thinking through the StreamJsonFormatter when streamJsonOutput is true;
modify the evaluation of includeThinking (which currently depends on jsonOutput
and config.getEphemeralSetting) and/or guard flushThoughtBuffer so that
thoughtBuffer is never written directly to stdout in STREAM_JSON mode.
In `@packages/cli/src/ui/hooks/useGeminiStream.ts`:
- Around line 951-1010: Trim event.value.subject and event.value.description
before composing thoughtContent in the ServerGeminiEventType.Thought handler
(use the existing symbols: event.value, thinkingBlocksRef, setThought,
ThinkingBlock, setPendingHistoryItem) and if the trimmed subject+description
result is empty/whitespace, skip creating/adding the ThinkingBlock and do not
call setThought or update setPendingHistoryItem; otherwise proceed as before
using the trimmed values to build thoughtContent and add the block.
In `@packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts`:
- Around line 918-919: The debug line in OpenAIResponsesProvider that calls
this.logger.debug(() => `Request body FULL: ${requestBody}`) must not print raw
requestBody (PII/secret risk); replace it with a redacted summary by creating or
calling a sanitizer (e.g., sanitizeRequestBody/redactRequestBody) that strips
sensitive fields (prompts, encryptedReasoning, keys) or returns only top-level
keys/lengths, then log that sanitized summary via this.logger.debug; ensure
references to requestBody in the log use the sanitized output and keep the
original requestBody unmodified.
In `@packages/core/src/providers/openai/parseResponsesStream.ts`:
- Around line 196-246: The reasoning blocks are currently skipped when
includeThinkingInResponse is false; instead, modify the handlers for the event
cases 'response.reasoning_text.done' and 'response.reasoning_summary_text.done'
(and similarly for the output_item.done and response.done fallback handlers) to
still yield the thinking block but with isHidden: true when
includeThinkingInResponse is false, and preserve encryptedContent if present;
i.e., where the code now gates on includeThinkingInResponse to decide to emit,
always add a yield that sets isHidden: !includeThinkingInResponse (or only sets
isHidden when false) while keeping emittedThoughts checks, thought content
(reasoningText / reasoningSummaryText / output item text), sourceField, and any
encryptedContent, then clear the accumulated buffers (reasoningText,
reasoningSummaryText, etc.) as before.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
packages/cli/src/nonInteractiveCli.test.tspackages/cli/src/nonInteractiveCli.tspackages/cli/src/settings/ephemeralSettings.textVerbosity.test.tspackages/cli/src/ui/commands/setCommand.tspackages/cli/src/ui/components/messages/GeminiMessage.tsxpackages/cli/src/ui/hooks/useGeminiStream.thinking.test.tsxpackages/cli/src/ui/hooks/useGeminiStream.tspackages/core/src/providers/openai-responses/OpenAIResponsesProvider.tspackages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningInclude.test.tspackages/core/src/providers/openai/parseResponsesStream.reasoning.test.tspackages/core/src/providers/openai/parseResponsesStream.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/cli/src/settings/ephemeralSettings.textVerbosity.test.ts
- packages/cli/src/ui/commands/setCommand.ts
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
📚 Learning: 2025-11-16T22:51:26.374Z
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
Applied to files:
packages/core/src/providers/openai/parseResponsesStream.reasoning.test.tspackages/cli/src/nonInteractiveCli.test.tspackages/cli/src/ui/hooks/useGeminiStream.tspackages/cli/src/nonInteractiveCli.tspackages/core/src/providers/openai-responses/OpenAIResponsesProvider.tspackages/core/src/providers/openai/parseResponsesStream.ts
📚 Learning: 2026-01-13T19:28:00.789Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2026-01-13T19:28:00.789Z
Learning: In the llxprt-code codebase (`packages/core/src/providers/anthropic/AnthropicProvider.ts`), Anthropic's API returns `contentBlock.input` as an already-parsed JavaScript object, not a JSON string. The code was incorrectly calling `JSON.stringify(contentBlock.input)` before passing it to `processToolParameters()`, which was designed for OpenAI-style string parameters. This causes arrays and other complex types to be corrupted into strings (e.g., `paths` array becomes a string `"[\"**/*.toml\"]"` instead of actual array). The fix is to use `contentBlock.input` directly without stringifying for Anthropic provider.
Applied to files:
packages/cli/src/ui/hooks/useGeminiStream.tspackages/core/src/providers/openai-responses/OpenAIResponsesProvider.tspackages/core/src/providers/openai/parseResponsesStream.ts
🧬 Code graph analysis (4)
packages/core/src/providers/openai/parseResponsesStream.reasoning.test.ts (2)
packages/core/src/services/history/IContent.ts (1)
IContent(21-40)packages/core/src/providers/openai/parseResponsesStream.ts (1)
parseResponsesStream(89-462)
packages/cli/src/nonInteractiveCli.test.ts (2)
packages/core/src/core/turn.ts (1)
ServerGeminiStreamEvent(222-239)packages/cli/src/nonInteractiveCli.ts (1)
runNonInteractive(35-437)
packages/cli/src/ui/hooks/useGeminiStream.ts (2)
packages/core/src/integration-tests/geminiChat-isolation.integration.test.ts (1)
stream(101-111)packages/core/src/services/history/IContent.ts (1)
ThinkingBlock(177-194)
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (2)
packages/core/src/services/history/IContent.ts (1)
ThinkingBlock(177-194)packages/core/src/providers/openai/parseResponsesStream.ts (2)
ParseResponsesStreamOptions(80-87)parseResponsesStream(89-462)
🔇 Additional comments (13)
packages/cli/src/ui/hooks/useGeminiStream.thinking.test.tsx (3)
175-194: Mock aligns with updated useStateAndRef behavior.
Looks good and mirrors the hook’s contract cleanly.
399-455: Replacement semantics test is clear and focused.
Good coverage for overwrite vs. append behavior.
507-546: Pending thinking blocks coverage looks solid.
Nice check that pending items surface thinking before content arrives.packages/cli/src/ui/components/messages/GeminiMessage.tsx (1)
47-50: Pending-item suppression avoids duplicate thinking output.
This guard matches the streaming UX expectations.Also applies to: 59-60
packages/cli/src/ui/hooks/useGeminiStream.ts (2)
665-685: Streaming updates now preserve thinking blocks correctly.
Keeps pending history consistent while content is streaming.
1107-1107: Dependency list update is correct.
IncludingsetPendingHistoryItemavoids stale closures.packages/cli/src/nonInteractiveCli.ts (1)
203-219: Thought buffering/flush ordering looks solid.Flushing buffered thoughts right before content/tool calls and at stream end keeps output order stable.
Also applies to: 260-262, 302-302
packages/cli/src/nonInteractiveCli.test.ts (1)
196-226: Good coverage for coalesced thought output.The test validates that buffered thoughts are emitted once before content, matching the new buffering behavior.
packages/core/src/providers/openai/parseResponsesStream.ts (1)
63-194: Reasoning delta buffering and spacing helper look solid.
appendReasoningDeltaand separate handling for reasoning vs summary deltas keep output readable and deterministic.packages/core/src/providers/openai/parseResponsesStream.reasoning.test.ts (1)
1-310: Reasoning stream test coverage is strong.The suite exercises interleaving, empty deltas, spacing, dedupe, and usage ordering in a way that matches the new parser behavior.
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (2)
763-771: Confirm Codex should drop reasoning input items.In Codex mode, reasoning items are filtered out entirely before the synthetic config injection. If Codex accepts reasoning input items, this blocks
reasoning.includeInContextfrom working. Please confirm with Codex /responses behavior (or add a targeted test) that this is intended.
997-1006: Passing includeThinkingInResponse into the parser is a good alignment.This keeps streaming output consistent with the runtime setting.
packages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningInclude.test.ts (1)
1-660: Great coverage for reasoning include/encrypted_content flows.These tests exercise request shaping, SSE parsing, and context reinjection across the key reasoning settings.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts
Outdated
Show resolved
Hide resolved
- Normalize thought subject/description by trimming whitespace - Skip creating thinking blocks when subject === description (prevents 'Preparing: Preparing' style duplication) - Skip empty/whitespace-only thoughts entirely - Prevent thinking output in STREAM_JSON mode (would corrupt JSON stream) Fixes #922
- OpenAIResponsesProvider: respect user-specified tool_choice (don't override) - OpenAIResponsesProvider: fix reasoning item ID collision with counter - OpenAIResponsesProvider: redact PII from debug logs (keys-only summary) - parseResponsesStream: emit hidden reasoning when includeThinkingInResponse=false (preserves encrypted_content for round-trip while hiding UI display) - IContent: treat encryptedContent as valid content for thinking blocks - nonInteractiveCli.test: move skipped tests inside describe scope - runtimeSettings.test: add text.verbosity assertion - NOTICES.txt: add MIT license text for @hono/node-server Issue: #922 PR: #1156
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts`:
- Around line 597-611: When adding reasoning items in OpenAIResponsesProvider
(inside the includeReasoningInContext block that iterates thinkingBlocks), guard
against undefined/empty thinkingBlock.thought before inserting a summary_text
entry: either skip adding a summary element if thinkingBlock.thought is falsy or
substitute a safe fallback (e.g., an empty string) so you never create { type:
'summary_text', text: undefined }; update the loop that builds the summary array
for each reasoning_{...} entry to check thinkingBlock.thought and only include
valid summary objects.
🧹 Nitpick comments (1)
packages/core/src/providers/openai/parseResponsesStream.ts (1)
134-162: Consider gating verbose debug logging behind a flag or reducing frequency.The debug logging is helpful for development but may generate significant log volume in production. The per-event logging (lines 142-154) and reasoning-specific logging (lines 147-162) could impact performance with high-throughput streams.
♻️ Optional: Add a verbose flag or use trace level
Consider either:
- Adding a
verboseoption toParseResponsesStreamOptionsto control detailed logging- Using a more granular log level (e.g.,
trace) for per-event logs while keeping reasoning logs atdebugThis is not blocking since debug logs are typically disabled in production.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
packages/cli/src/nonInteractiveCli.test.tspackages/cli/src/runtime/runtimeSettings.reasoningSummary.test.tspackages/core/src/providers/openai-responses/OpenAIResponsesProvider.tspackages/core/src/providers/openai/parseResponsesStream.tspackages/core/src/services/history/IContent.tspackages/vscode-ide-companion/NOTICES.txt
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/cli/src/nonInteractiveCli.test.ts
- packages/cli/src/runtime/runtimeSettings.reasoningSummary.test.ts
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2026-01-13T19:28:00.789Z
Learning: In the llxprt-code codebase (`packages/core/src/providers/anthropic/AnthropicProvider.ts`), Anthropic's API returns `contentBlock.input` as an already-parsed JavaScript object, not a JSON string. The code was incorrectly calling `JSON.stringify(contentBlock.input)` before passing it to `processToolParameters()`, which was designed for OpenAI-style string parameters. This causes arrays and other complex types to be corrupted into strings (e.g., `paths` array becomes a string `"[\"**/*.toml\"]"` instead of actual array). The fix is to use `contentBlock.input` directly without stringifying for Anthropic provider.
📚 Learning: 2026-01-13T19:28:00.789Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2026-01-13T19:28:00.789Z
Learning: In the llxprt-code codebase (`packages/core/src/providers/anthropic/AnthropicProvider.ts`), Anthropic's API returns `contentBlock.input` as an already-parsed JavaScript object, not a JSON string. The code was incorrectly calling `JSON.stringify(contentBlock.input)` before passing it to `processToolParameters()`, which was designed for OpenAI-style string parameters. This causes arrays and other complex types to be corrupted into strings (e.g., `paths` array becomes a string `"[\"**/*.toml\"]"` instead of actual array). The fix is to use `contentBlock.input` directly without stringifying for Anthropic provider.
Applied to files:
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.tspackages/core/src/providers/openai/parseResponsesStream.tspackages/core/src/services/history/IContent.ts
📚 Learning: 2025-11-16T22:51:26.374Z
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
Applied to files:
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.tspackages/core/src/providers/openai/parseResponsesStream.ts
📚 Learning: 2025-12-14T16:16:39.939Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-14T16:16:39.939Z
Learning: MiniMax M2 model (hf:MiniMax/MiniMax-M2) requires reasoning.stripFromContext set to "none" to preserve interleaved thinking tokens in conversation history, as dropping thinking content harms agent reliability.
Applied to files:
packages/core/src/providers/openai/parseResponsesStream.ts
🧬 Code graph analysis (2)
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (2)
packages/core/src/services/history/IContent.ts (1)
ThinkingBlock(177-194)packages/core/src/providers/openai/parseResponsesStream.ts (2)
ParseResponsesStreamOptions(80-87)parseResponsesStream(89-457)
packages/core/src/providers/openai/parseResponsesStream.ts (1)
packages/core/src/services/history/IContent.ts (2)
IContent(21-40)ContentBlock(93-99)
🔇 Additional comments (25)
packages/vscode-ide-companion/NOTICES.txt (4)
10-10: LGTM: Dependency version update.The
@modelcontextprotocol/sdkversion update from 1.24.3 to 1.25.2 is properly documented with complete MIT license text.
36-59: LGTM: License text issue resolved.The complete MIT license text for
@hono/node-server@1.19.9 is now properly included, addressing the previous review concern.
1037-1037: LGTM: Dependency patch update.The qs library patch version update from 6.14.0 to 6.14.1 is properly documented with complete BSD 3-Clause license text.
2290-2351: LGTM: Comprehensive license attribution.The [email protected] entry includes thorough BSD 2-Clause license text with detailed copyright attributions covering both the library source code and JSON Schema specification documentation across multiple draft versions.
packages/core/src/services/history/IContent.ts (2)
191-193: LGTM! NewencryptedContentfield for round-trip reasoning support.The addition of the
encryptedContentfield toThinkingBlockenables preservation of OpenAI's encrypted reasoning content for stateless round-trip contexts, aligning with the PR objectives.
235-241: LGTM! Validation logic correctly treats encrypted-only blocks as valid.The updated
hasContentcheck properly considersencryptedContentas valid content, ensuring encrypted-only thinking blocks are preserved for round-trip reasoning even whenthoughtis empty.packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (11)
32-35: LGTM! Added necessary imports for reasoning support.The
ThinkingBlockimport enables proper typing for reasoning block handling in this provider.
362-367: LGTM! Extended input union to support reasoning items.The reasoning type addition to the input union matches the OpenAI Responses API schema for reasoning items with
id,summary, andencrypted_contentfields.
551-557: LGTM! ResponsesInputItem union includes reasoning variant.This correctly models the Responses API input types including the new reasoning variant needed for round-trip context.
568-573: LGTM! Reasoning context configuration with per-request ID counter.The
includeReasoningInContextsetting andreasoningIdCounterprovide proper control over reasoning inclusion and unique ID generation within a request.
766-774: LGTM! Defensive filtering of reasoning items for Codex synthetic injection.Correctly filters out reasoning items before injecting synthetic config file read, preventing potential issues with the Codex path.
796-801: LGTM! Correctly respects user-specifiedtool_choice.The conditional check now only defaults to
'auto'whentool_choiceis not already set, preserving user-specified values like'required'or specific function names.
804-830: LGTM! Reasoning configuration for request building.The logic correctly derives
shouldRequestReasoningand addsincludeparameter for encrypted content when reasoning is enabled. The debug logging helps with troubleshooting.
832-846: LGTM! Reasoning summary configuration.Correctly adds
reasoning.summaryto the request when set and not'none', matching the codex-rs implementation pattern.
854-871: LGTM! Text verbosity support for Responses API.The
text.verbosityfield is correctly validated and added to the request when set to a valid value (low,medium,high).
924-927: LGTM! Debug logging now redacts sensitive data.The debug log now only outputs request keys rather than the full body, addressing the previous review concern about PII/secret exposure.
1006-1014: LGTM! Stream options passed to parseResponsesStream.The
includeThinkingInResponseoption is correctly passed to the stream parser, ensuring reasoning blocks respect the user's visibility preference.packages/core/src/providers/openai/parseResponsesStream.ts (8)
8-15: LGTM! Updated imports for reasoning support.The addition of
ContentBlockimport andDebugLoggerenables proper typing and debugging for reasoning block handling.
17-53: LGTM! Extended ResponsesEvent type for reasoning fields.The additions of
text,content_index,summary_index, and the extendeditemshape (withsummary,content,encrypted_contentarrays) correctly model the OpenAI Responses API reasoning events.
63-75: LGTM! Helper for concatenating reasoning deltas with smart spacing.The
appendReasoningDeltafunction handles edge cases well—returning early for empty values and inserting a space only when needed between word characters and parentheses.
77-107: LGTM! Stream options and deduplication tracking.The
ParseResponsesStreamOptionsinterface withincludeThinkingInResponseand theemittedThoughtsSet provide proper configuration and deduplication for reasoning blocks.
176-194: LGTM! Delta accumulation for reasoning text and summary.The handlers correctly accumulate deltas using
appendReasoningDeltaand keepreasoningTextandreasoningSummaryTextas separate buffers.
266-333: LGTM! Comprehensive reasoning handling in output_item.done.The handler correctly:
- Extracts thought text from
summaryandcontentarrays with fallback to accumulated buffers- Deduplicates against
emittedThoughts- Includes
encryptedContentwhen present- Sets
isHiddenbased onincludeThinkingInResponse- Clears buffers after processing
379-421: LGTM! Fallback emission for remaining reasoning on stream completion.The
response.completed/response.donehandler correctly emits any accumulated reasoning that wasn't emitted via other event paths, with proper deduplication andisHiddenflag.
196-242: Reasoning events do not includeencrypted_contentin.doneevents.The OpenAI Responses API does not include
encrypted_contentinresponse.reasoning_text.doneorresponse.reasoning_summary_text.doneevents. Theencrypted_contentfield is only available on the reasoning item object itself (retrieved separately viainclude=["reasoning.encrypted_content"]), not on the streaming event payloads. The current code is correct and does not need modification.Likely an incorrect or invalid review comment.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
When building reasoning items for the Responses API context, ensure
thinkingBlock.thought has a fallback to empty string to avoid creating
invalid { type: 'summary_text', text: undefined } objects.
Addresses CodeRabbit review #3720508548
Integrate main branch features while preserving issue #922 fixes: - Add settings registry with auto-validation for enum/boolean/number types - Fix reasoning.summary='none' handling to exclude from API requests - Add reasoning.effort to request body when reasoning is enabled - Filter 'reasoning' object from modelParams to prevent override - Update pending history item with thinking blocks during streaming - Add setPendingHistoryItem to useCallback dependency array Key changes for #922: - reasoning.summary and text.verbosity settings in registry - codex.config includes reasoning.summary=auto and reasoning.effort=medium - OpenAIResponsesProvider reads reasoning settings from model-behavior - ThinkingBlocks visible in pendingHistoryItems during streaming
…lation The thinkingBlocksRef was accumulating across multiple tool calls within the same turn. When each chunk was committed to history via flushPendingHistoryItem, the thinking blocks from previous chunks would be included again, causing the pyramid effect where each message showed all previous thoughts plus the current one. Fix: Clear thinkingBlocksRef.current = [] immediately after committing an item to history. This ensures each history item only contains the thinking blocks that occurred since the last commit. Fixes #922
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/core/src/providers/openai/parseResponsesStream.ts (1)
98-333: Encrypted content can be lost whenresponse.reasoning_text.doneprecedesresponse.output_item.donewith the same reasoning text.Per OpenAI's Responses API, both event types can carry the same reasoning text content, but
encrypted_contentappears only on the reasoning output item (inresponse.output_item.done). Since events can arrive out of order, ifreasoning_text.doneemits first and adds the text toemittedThoughts, the subsequentoutput_item.doneis skipped entirely—dropping theencrypted_contentneeded for stateless workflows (ZDR/store: false).Use a Map to track whether encrypted content has been captured per thought, allowing a hidden re-emit on first arrival of
encrypted_content:Proposed fix
- const emittedThoughts = new Set<string>(); + const emittedThoughts = new Map<string, { hasEncrypted: boolean }>();In
response.reasoning_text.doneandresponse.reasoning_summary_text.donehandlers, replaceemittedThoughts.add(...)withemittedThoughts.set(thoughtContent, { hasEncrypted: false }).In
response.output_item.donehandler for reasoning items, check:const prior = emittedThoughts.get(finalThought); const hasEncryptedContent = Boolean(event.item?.encrypted_content); const shouldEmit = finalThought && (!prior || (hasEncryptedContent && !prior.hasEncrypted)); if (shouldEmit) { const shouldHide = !includeThinkingInResponse || Boolean(prior); // ... emit with isHidden: shouldHide ... emittedThoughts.set(finalThought, { hasEncrypted: Boolean(prior?.hasEncrypted) || hasEncryptedContent, }); }Add a test covering the scenario where
reasoning_text.doneprecedesoutput_item.donewithencrypted_content.
🤖 Fix all issues with AI agents
In `@packages/cli/src/nonInteractiveCli.test.ts`:
- Around line 812-860: The test "should NOT emit pyramid-style repeated prefixes
in non-interactive CLI" is missing the ephemeral setting mock; before calling
runNonInteractive, add the same ephemeral stub used in the other tests by
configuring mockSettings (e.g., set mockSettings.ephemeral or stub
mockSettings.get('ephemeral')/mockSettings.getSetting to return the same value
used in the active test) so runNonInteractive and its use of mockSettings behave
consistently with the other tests that already include the ephemeral mock.
- Around line 762-810: The skipped test "should accumulate multiple Thought
events and flush once on content boundary" is missing a mock for
getEphemeralSetting to enable ephemeral reasoning output and its assertions
don't match how thought subjects are concatenated; before calling
runNonInteractive mock getEphemeralSetting to return true for the reasoning
setting (same setup used in the active test) so thinking output is produced, and
update the assertions against processStdoutSpy to match the actual thoughtText
construction used by the code (e.g., check for the concatenated subject format
produced by the Thought events from GeminiEventType.Thought rather than "First
thought"/"Second thought").
In `@packages/cli/src/settings/ephemeralSettings.ts`:
- Around line 70-75: isValidEphemeralSetting currently checks the raw key
against validEphemeralKeys before parsing, so alias keys (e.g., "max-tokens")
are rejected; modify isValidEphemeralSetting to first resolve aliases by calling
the same alias-resolution logic used in parseEphemeralSettingValue (i.e., derive
the canonical key from the input key) and then check that canonical key against
validEphemeralKeys and call parseEphemeralSettingValue using the canonical key
so validation behavior matches parseEphemeralSettingValue; update references to
validEphemeralKeys, parseEphemeralSettingValue, and isValidEphemeralSetting
accordingly.
In `@packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts`:
- Around line 560-563: The code reads reasoning.includeInContext /
reasoning.includeInResponse / reasoning.enabled only from SettingsService (via
options.settings), ignoring per-invocation ephemerals or modelBehavior; update
the logic in OpenAIResponsesProvider (around where includeReasoningInContext is
computed) to first check any invocation-level overrides (e.g.,
options.ephemeral, options.modelBehavior or options.invocationModelBehavior) and
the modelBehavior fetchers for reasoning flags, and only if those are undefined
fall back to options.settings?.get('reasoning...'); apply the same precedence
for all three flags (includeInContext, includeInResponse, enabled) and the other
occurrence block later (the region referenced at 796-810) so ephemeral /
per-call overrides take precedence over SettingsService.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
packages/cli/src/nonInteractiveCli.test.tspackages/cli/src/nonInteractiveCli.tspackages/cli/src/providers/aliases/codex.configpackages/cli/src/providers/providerAliases.codex.reasoningSummary.test.tspackages/cli/src/runtime/runtimeSettings.reasoningSummary.test.tspackages/cli/src/runtime/runtimeSettings.tspackages/cli/src/settings/ephemeralSettings.reasoningSummary.test.tspackages/cli/src/settings/ephemeralSettings.textVerbosity.test.tspackages/cli/src/settings/ephemeralSettings.tspackages/cli/src/ui/commands/setCommand.tspackages/cli/src/ui/components/messages/GeminiMessage.tsxpackages/cli/src/ui/hooks/useGeminiStream.thinking.test.tsxpackages/cli/src/ui/hooks/useGeminiStream.tspackages/core/src/providers/openai-responses/OpenAIResponsesProvider.tspackages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningInclude.test.tspackages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningSummary.test.tspackages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.textVerbosity.test.tspackages/core/src/providers/openai/openaiRequestParams.tspackages/core/src/providers/openai/parseResponsesStream.reasoning.test.tspackages/core/src/providers/openai/parseResponsesStream.tspackages/core/src/services/history/IContent.tspackages/core/src/settings/settingsRegistry.ts
🚧 Files skipped from review as they are similar to previous changes (5)
- packages/cli/src/providers/aliases/codex.config
- packages/cli/src/ui/components/messages/GeminiMessage.tsx
- packages/core/src/providers/openai-responses/tests/OpenAIResponsesProvider.textVerbosity.test.ts
- packages/cli/src/runtime/runtimeSettings.reasoningSummary.test.ts
- packages/cli/src/providers/providerAliases.codex.reasoningSummary.test.ts
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
📚 Learning: 2026-01-13T19:28:00.789Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2026-01-13T19:28:00.789Z
Learning: In the llxprt-code codebase (`packages/core/src/providers/anthropic/AnthropicProvider.ts`), Anthropic's API returns `contentBlock.input` as an already-parsed JavaScript object, not a JSON string. The code was incorrectly calling `JSON.stringify(contentBlock.input)` before passing it to `processToolParameters()`, which was designed for OpenAI-style string parameters. This causes arrays and other complex types to be corrupted into strings (e.g., `paths` array becomes a string `"[\"**/*.toml\"]"` instead of actual array). The fix is to use `contentBlock.input` directly without stringifying for Anthropic provider.
Applied to files:
packages/core/src/services/history/IContent.tspackages/cli/src/nonInteractiveCli.tspackages/core/src/providers/openai-responses/OpenAIResponsesProvider.tspackages/cli/src/ui/hooks/useGeminiStream.tspackages/core/src/providers/openai/parseResponsesStream.ts
📚 Learning: 2025-11-16T22:51:26.374Z
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
Applied to files:
packages/cli/src/nonInteractiveCli.test.tspackages/core/src/providers/openai/parseResponsesStream.reasoning.test.tspackages/cli/src/nonInteractiveCli.tspackages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningSummary.test.tspackages/core/src/providers/openai-responses/OpenAIResponsesProvider.tspackages/cli/src/ui/hooks/useGeminiStream.tspackages/core/src/providers/openai/parseResponsesStream.ts
📚 Learning: 2026-01-03T17:53:10.145Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2026-01-03T17:53:10.145Z
Learning: In Codex OAuth mode, the synthetic read_file tool call should always target AGENTS.md (matching CODEX_SYSTEM_PROMPT expectations), but include the actual userMemory content from whatever files are configured (LLXPRT.md, AGENTS.md, or both). The synthetic output should NOT disclose which files were actually loaded - the goal is to convince GPT it already read AGENTS.md to prevent redundant reads, regardless of the actual file configuration.
Applied to files:
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts
📚 Learning: 2025-12-14T16:16:39.939Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-14T16:16:39.939Z
Learning: MiniMax M2 model (hf:MiniMax/MiniMax-M2) requires reasoning.stripFromContext set to "none" to preserve interleaved thinking tokens in conversation history, as dropping thinking content harms agent reliability.
Applied to files:
packages/core/src/providers/openai/parseResponsesStream.ts
🧬 Code graph analysis (8)
packages/cli/src/settings/ephemeralSettings.reasoningSummary.test.ts (1)
packages/cli/src/settings/ephemeralSettings.ts (2)
ephemeralSettingHelp(15-15)isValidEphemeralSetting(70-76)
packages/cli/src/ui/hooks/useGeminiStream.thinking.test.tsx (1)
packages/cli/src/test-utils/render.tsx (1)
waitFor(245-263)
packages/core/src/providers/openai/parseResponsesStream.reasoning.test.ts (3)
packages/core/src/integration-tests/geminiChat-isolation.integration.test.ts (1)
stream(101-111)packages/core/src/services/history/IContent.ts (1)
IContent(21-40)packages/core/src/providers/openai/parseResponsesStream.ts (1)
parseResponsesStream(89-457)
packages/cli/src/settings/ephemeralSettings.textVerbosity.test.ts (1)
packages/cli/src/settings/ephemeralSettings.ts (2)
parseEphemeralSettingValue(35-61)ephemeralSettingHelp(15-15)
packages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningSummary.test.ts (1)
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (1)
OpenAIResponsesProvider(70-1052)
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (4)
packages/core/src/providers/BaseProvider.ts (1)
NormalizedGenerateChatOptions(63-80)packages/core/src/providers/openai-responses/buildResponsesInputFromContent.ts (1)
ResponsesInputItem(27-35)packages/core/src/services/history/IContent.ts (1)
ThinkingBlock(177-194)packages/core/src/providers/openai/parseResponsesStream.ts (2)
ParseResponsesStreamOptions(80-87)parseResponsesStream(89-457)
packages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningInclude.test.ts (2)
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (1)
OpenAIResponsesProvider(70-1052)packages/core/src/test-utils/providerCallOptions.ts (1)
createProviderCallOptions(207-257)
packages/core/src/providers/openai/parseResponsesStream.ts (1)
packages/core/src/services/history/IContent.ts (2)
IContent(21-40)ContentBlock(93-99)
⏰ Context from checks skipped due to timeout of 270000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test (windows-latest, 24.x)
- GitHub Check: Test (macos-latest, 24.x)
- GitHub Check: Test (ubuntu-latest, 24.x)
- GitHub Check: E2E Test (macOS)
- GitHub Check: E2E Test (Linux) - sandbox:none
- GitHub Check: E2E Test (Linux) - sandbox:docker
- GitHub Check: Slow E2E - Win
🔇 Additional comments (27)
packages/cli/src/nonInteractiveCli.ts (4)
280-301: LGTM! Thought buffering mechanism correctly implemented.The STREAM_JSON corruption issue from the previous review is properly addressed by including
!streamJsonOutputin theincludeThinkingguard (line 285). The buffer is correctly scoped per-turn, andflushThoughtBufferproperly handles empty/whitespace-only content.
315-340: LGTM! Thought buffering prevents fragmented output.The buffering approach correctly accumulates multiple thoughts with space separators before flushing as a single
<think>block. Emoji filtering is properly applied before buffering, and blocked thoughts are skipped entirely.
342-343: LGTM! Strategic flush points ensure correct output ordering.Flushing the thought buffer before Content and ToolCallRequest events ensures reasoning appears before response text and tool invocations, matching the expected UX where thinking precedes action.
Also applies to: 385-386
427-428: LGTM! Final flush handles edge cases.The final
flushThoughtBuffer()after the event loop ensures any remaining buffered thoughts are emitted even if the stream ends without a Content or ToolCallRequest event (e.g., reasoning-only streams).packages/cli/src/ui/hooks/useGeminiStream.thinking.test.tsx (3)
176-195: Well-structured generic mock foruseStateAndRef.The mock correctly implements the state/ref synchronization pattern with proper typing. The updater function handling (detecting function vs value) mirrors React's setState behavior appropriately.
399-455: Good test coverage for thought replacement semantics.The controlled promise resolver pattern allows precise verification that
thoughtstate is overwritten (not appended) across successive events. The assertions correctly validate the expected replacement behavior.
507-546: Test correctly validates pending thinking block exposure during streaming.This test aligns well with the PR objective to ensure thinking blocks are visible in
pendingHistoryItemsbefore content arrives, enabling real-time UI display of reasoning content.packages/cli/src/ui/hooks/useGeminiStream.ts (3)
274-277: Correct thinking block cleanup after history commit.Clearing
thinkingBlocksRef.currentafter committing ensures thinking blocks don't accumulate across multiple tool calls within the same turn. This is the appropriate location since it occurs afteraddItem()captures the blocks.
969-977: Thinking blocks now surface in pending history during streaming.This change enables real-time UI display of reasoning content by updating
pendingHistoryItemas soon as thinking blocks arrive. The implementation correctly creates a new array reference for React state updates.One consideration: when
itemis null/undefined, this creates a pending item with empty text but populatedthinkingBlocks. This appears intentional based on the test "should expose pending thinking blocks before content arrives", but verify this doesn't cause visual artifacts in the UI when thinking blocks appear before any text content.
1078-1081: Correct dependency array update.Adding
setPendingHistoryItemto the dependency array is required since it's now used withinprocessGeminiStreamEventsto update pending history with thinking blocks.packages/cli/src/nonInteractiveCli.test.ts (1)
203-233: LGTM!The test correctly sets up the ephemeral setting mock to enable thinking output, creates a stream with multiple thought events followed by content, and verifies that thoughts are coalesced into a single
<think>block before the content is emitted.packages/cli/src/settings/ephemeralSettings.reasoningSummary.test.ts (1)
1-70: Nice coverage for reasoning.summary validation and help text.Tests are clear and exercise valid values, invalid values, and type mismatches.
packages/cli/src/settings/ephemeralSettings.textVerbosity.test.ts (1)
1-55: Good test coverage for text.verbosity parsing and help metadata.packages/core/src/providers/openai/openaiRequestParams.ts (1)
55-86: Sanitization updates look good.packages/cli/src/runtime/runtimeSettings.ts (1)
897-899: Good centralization of profile-persistable ephemeral keys.packages/core/src/services/history/IContent.ts (1)
191-249: Thinking block + validation updates look solid (encrypted content included).packages/cli/src/ui/commands/setCommand.ts (1)
316-340: Nice addition of reasoning.summary and text.verbosity to /set completions.
The new options and hints align well with the existing direct-setting schema and should improve discoverability.packages/core/src/settings/settingsRegistry.ts (2)
228-245: Registry additions look good.
New reasoning.summary and text.verbosity entries are well-scoped and consistent with existing model-behavior settings.
1023-1058: Auto-validation for enum/boolean/number is a solid improvement.
Reduces custom validator boilerplate while keeping type constraints explicit.packages/core/src/providers/openai/parseResponsesStream.reasoning.test.ts (1)
1-311: Great coverage for reasoning stream edge cases and ordering.
The tests exercise deltas, summaries, whitespace, and usage metadata in a clear, robust way.packages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningInclude.test.ts (1)
1-660: Test suite is thorough and well-targeted.
It validates include wiring, encrypted_content handling, and context round-tripping effectively.packages/core/src/providers/openai-responses/__tests__/OpenAIResponsesProvider.reasoningSummary.test.ts (1)
1-367: Reasoning.summary scenarios are well covered.
Good spread of values and omission cases with clear request assertions.packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (1)
753-761: Add explanatory comment or confirm Codex /responses API reasoning support. The code intentionally filters reasoning items only in Codex mode (lines 753–761), but provides no comment explaining whether this is due to Codex API limitations or architectural preference. If Codex /responses does support reasoning items withencrypted_content(like the standard OpenAI Responses API), this filtering may unnecessarily break context preservation on subsequent requests whenreasoning.includeInContextis enabled. Either document the limitation or restore reasoning items to the Codex input if the API supports them.packages/core/src/providers/openai/parseResponsesStream.ts (4)
8-35: Strong event typing for reasoning/summary payloads.
Clearer typing here will make the new reasoning branches safer and easier to evolve.
63-93: Helper + options wiring looks solid.
Nice encapsulation of reasoning delta concatenation and an explicit options surface.
134-163: Debug logging is useful for SSE bring‑up.
The extra reasoning logs should help validate event ordering during integration.
379-422: Fallback emission and usage metadata handling look good.
This should prevent reasoning loss on response termination while keeping usage intact.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| it.skip('should accumulate multiple Thought events and flush once on content boundary', async () => { | ||
| const thoughtEvent1: ServerGeminiStreamEvent = { | ||
| type: GeminiEventType.Thought, | ||
| value: { | ||
| subject: 'First', | ||
| description: 'thought', | ||
| }, | ||
| }; | ||
| const thoughtEvent2: ServerGeminiStreamEvent = { | ||
| type: GeminiEventType.Thought, | ||
| value: { | ||
| subject: 'Second', | ||
| description: 'thought', | ||
| }, | ||
| }; | ||
| const contentEvent: ServerGeminiStreamEvent = { | ||
| type: GeminiEventType.Content, | ||
| value: 'Response text', | ||
| }; | ||
| const finishedEvent: ServerGeminiStreamEvent = { | ||
| type: GeminiEventType.Finished, | ||
| value: { reason: undefined, usageMetadata: { totalTokenCount: 10 } }, | ||
| }; | ||
|
|
||
| mockGeminiClient.sendMessageStream.mockReturnValueOnce( | ||
| createStreamFromEvents([ | ||
| thoughtEvent1, | ||
| thoughtEvent2, | ||
| contentEvent, | ||
| finishedEvent, | ||
| ]), | ||
| ); | ||
|
|
||
| await runNonInteractive({ | ||
| config: mockConfig, | ||
| settings: mockSettings, | ||
| input: 'test query', | ||
| prompt_id: 'test-prompt-id', | ||
| }); | ||
|
|
||
| const thinkingOutputs = processStdoutSpy.mock.calls.filter( | ||
| ([output]: [string]) => output.includes('<think>'), | ||
| ); | ||
|
|
||
| expect(thinkingOutputs).toHaveLength(1); | ||
| const thinkingText = thinkingOutputs[0][0]; | ||
| expect(thinkingText).toContain('First thought'); | ||
| expect(thinkingText).toContain('Second thought'); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipped tests are missing ephemeral setting mock for thinking output.
When these tests are unskipped, they will likely fail because getEphemeralSetting is not mocked to return true for the reasoning setting. The active test at line 204-206 demonstrates the required setup. Additionally, verify that the expected assertion format matches the implementation—the active test expects subjects concatenated ("First Second"), but this test expects "First thought" and "Second thought" which may not match how thoughtText is constructed.
🛠️ Suggested fix when unskipping
- it.skip('should accumulate multiple Thought events and flush once on content boundary', async () => {
+ it('should accumulate multiple Thought events and flush once on content boundary', async () => {
+ mockConfig.getEphemeralSetting = vi
+ .fn<(key: string) => boolean | undefined>()
+ .mockReturnValue(true);
+
const thoughtEvent1: ServerGeminiStreamEvent = {Also verify the expected output format matches the implementation (e.g., "First: thought" vs "First thought").
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it.skip('should accumulate multiple Thought events and flush once on content boundary', async () => { | |
| const thoughtEvent1: ServerGeminiStreamEvent = { | |
| type: GeminiEventType.Thought, | |
| value: { | |
| subject: 'First', | |
| description: 'thought', | |
| }, | |
| }; | |
| const thoughtEvent2: ServerGeminiStreamEvent = { | |
| type: GeminiEventType.Thought, | |
| value: { | |
| subject: 'Second', | |
| description: 'thought', | |
| }, | |
| }; | |
| const contentEvent: ServerGeminiStreamEvent = { | |
| type: GeminiEventType.Content, | |
| value: 'Response text', | |
| }; | |
| const finishedEvent: ServerGeminiStreamEvent = { | |
| type: GeminiEventType.Finished, | |
| value: { reason: undefined, usageMetadata: { totalTokenCount: 10 } }, | |
| }; | |
| mockGeminiClient.sendMessageStream.mockReturnValueOnce( | |
| createStreamFromEvents([ | |
| thoughtEvent1, | |
| thoughtEvent2, | |
| contentEvent, | |
| finishedEvent, | |
| ]), | |
| ); | |
| await runNonInteractive({ | |
| config: mockConfig, | |
| settings: mockSettings, | |
| input: 'test query', | |
| prompt_id: 'test-prompt-id', | |
| }); | |
| const thinkingOutputs = processStdoutSpy.mock.calls.filter( | |
| ([output]: [string]) => output.includes('<think>'), | |
| ); | |
| expect(thinkingOutputs).toHaveLength(1); | |
| const thinkingText = thinkingOutputs[0][0]; | |
| expect(thinkingText).toContain('First thought'); | |
| expect(thinkingText).toContain('Second thought'); | |
| }); | |
| it('should accumulate multiple Thought events and flush once on content boundary', async () => { | |
| mockConfig.getEphemeralSetting = vi | |
| .fn<(key: string) => boolean | undefined>() | |
| .mockReturnValue(true); | |
| const thoughtEvent1: ServerGeminiStreamEvent = { | |
| type: GeminiEventType.Thought, | |
| value: { | |
| subject: 'First', | |
| description: 'thought', | |
| }, | |
| }; | |
| const thoughtEvent2: ServerGeminiStreamEvent = { | |
| type: GeminiEventType.Thought, | |
| value: { | |
| subject: 'Second', | |
| description: 'thought', | |
| }, | |
| }; | |
| const contentEvent: ServerGeminiStreamEvent = { | |
| type: GeminiEventType.Content, | |
| value: 'Response text', | |
| }; | |
| const finishedEvent: ServerGeminiStreamEvent = { | |
| type: GeminiEventType.Finished, | |
| value: { reason: undefined, usageMetadata: { totalTokenCount: 10 } }, | |
| }; | |
| mockGeminiClient.sendMessageStream.mockReturnValueOnce( | |
| createStreamFromEvents([ | |
| thoughtEvent1, | |
| thoughtEvent2, | |
| contentEvent, | |
| finishedEvent, | |
| ]), | |
| ); | |
| await runNonInteractive({ | |
| config: mockConfig, | |
| settings: mockSettings, | |
| input: 'test query', | |
| prompt_id: 'test-prompt-id', | |
| }); | |
| const thinkingOutputs = processStdoutSpy.mock.calls.filter( | |
| ([output]: [string]) => output.includes('<think>'), | |
| ); | |
| expect(thinkingOutputs).toHaveLength(1); | |
| const thinkingText = thinkingOutputs[0][0]; | |
| expect(thinkingText).toContain('First thought'); | |
| expect(thinkingText).toContain('Second thought'); | |
| }); |
🤖 Prompt for AI Agents
In `@packages/cli/src/nonInteractiveCli.test.ts` around lines 762 - 810, The
skipped test "should accumulate multiple Thought events and flush once on
content boundary" is missing a mock for getEphemeralSetting to enable ephemeral
reasoning output and its assertions don't match how thought subjects are
concatenated; before calling runNonInteractive mock getEphemeralSetting to
return true for the reasoning setting (same setup used in the active test) so
thinking output is produced, and update the assertions against processStdoutSpy
to match the actual thoughtText construction used by the code (e.g., check for
the concatenated subject format produced by the Thought events from
GeminiEventType.Thought rather than "First thought"/"Second thought").
| it.skip('should NOT emit pyramid-style repeated prefixes in non-interactive CLI', async () => { | ||
| const thoughtEvent1: ServerGeminiStreamEvent = { | ||
| type: GeminiEventType.Thought, | ||
| value: { | ||
| subject: 'Analyzing', | ||
| description: '', | ||
| }, | ||
| }; | ||
| const thoughtEvent2: ServerGeminiStreamEvent = { | ||
| type: GeminiEventType.Thought, | ||
| value: { | ||
| subject: 'request', | ||
| description: '', | ||
| }, | ||
| }; | ||
| const contentEvent: ServerGeminiStreamEvent = { | ||
| type: GeminiEventType.Content, | ||
| value: 'Response', | ||
| }; | ||
| const finishedEvent: ServerGeminiStreamEvent = { | ||
| type: GeminiEventType.Finished, | ||
| value: { reason: undefined, usageMetadata: { totalTokenCount: 10 } }, | ||
| }; | ||
|
|
||
| mockGeminiClient.sendMessageStream.mockReturnValueOnce( | ||
| createStreamFromEvents([ | ||
| thoughtEvent1, | ||
| thoughtEvent2, | ||
| contentEvent, | ||
| finishedEvent, | ||
| ]), | ||
| ); | ||
|
|
||
| await runNonInteractive({ | ||
| config: mockConfig, | ||
| settings: mockSettings, | ||
| input: 'test query', | ||
| prompt_id: 'test-prompt-id', | ||
| }); | ||
|
|
||
| const thinkingOutputs = processStdoutSpy.mock.calls.filter( | ||
| ([output]: [string]) => output.includes('<think>'), | ||
| ); | ||
|
|
||
| expect(thinkingOutputs).toHaveLength(1); | ||
| const thinkingText = thinkingOutputs[0][0]; | ||
| const thoughtCount = (thinkingText.match(/Analyzing/g) || []).length; | ||
| expect(thoughtCount).toBe(1); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue: missing ephemeral setting mock.
This skipped test also needs the ephemeral setting mock when unskipped, consistent with the first skipped test and the active test at line 203.
🤖 Prompt for AI Agents
In `@packages/cli/src/nonInteractiveCli.test.ts` around lines 812 - 860, The test
"should NOT emit pyramid-style repeated prefixes in non-interactive CLI" is
missing the ephemeral setting mock; before calling runNonInteractive, add the
same ephemeral stub used in the other tests by configuring mockSettings (e.g.,
set mockSettings.ephemeral or stub
mockSettings.get('ephemeral')/mockSettings.getSetting to return the same value
used in the active test) so runNonInteractive and its use of mockSettings behave
consistently with the other tests that already include the ephemeral mock.
The test was missing the getModelBehavior function in the invocation mock, causing a TypeError when the provider tried to call it to get reasoning settings.
1. parseResponsesStream.ts: Use Map instead of Set for emittedThoughts to track whether encrypted_content has been captured. This handles the case where reasoning_text.done arrives before output_item.done with encrypted_content - we can re-emit a hidden block with the encrypted content for stateless/ZDR workflows. 2. OpenAIResponsesProvider.ts: Add invocation ephemerals precedence for reasoning.includeInContext, reasoning.includeInResponse, and reasoning.enabled settings. This ensures /set (ephemeral) and per-call overrides take effect. 3. ephemeralSettings.ts: Resolve aliases before validating in isValidEphemeralSetting so alias keys like 'max-tokens' work consistently with parseEphemeralSettingValue.
The Thought event handler was pushing every thought to thinkingBlocksRef without checking if that exact thought text was already present. When the same thought arrives multiple times (e.g., from different event sources or retries), this caused duplicate display. Added check: only push to thinkingBlocksRef if the thought text isn't already in the array. Fixes #922
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts`:
- Around line 745-769: The current Codex branch removes 'reasoning' items by
creating requestInputWithoutReasoning and passing that to
injectSyntheticConfigFileRead, which drops encrypted reasoning from context;
instead, keep requestInput intact (do not filter out items with
type==='reasoning') and only create a separate list for injection: compute an
itemsToInject = requestInput.filter(item => !('type' in item && item.type ===
'reasoning')) and call this.injectSyntheticConfigFileRead(itemsToInject,
options, userMemory) so the synthetic tool call/result is prepended but the
original requestInput (including reasoning items, honored when
includeReasoningInContext is true) is preserved; adjust references to
requestInputWithoutReasoning accordingly and remove the assignment that replaces
requestInput with the filtered array.
In `@packages/core/src/providers/openai/parseResponsesStream.ts`:
- Around line 80-87: The docs for
ParseResponsesStreamOptions.includeThinkingInResponse are inaccurate: update the
comment on includeThinkingInResponse to state the actual behavior — when true
the parser emits ThinkingBlock content normally, and when false the parser still
emits thinking content but as hidden/metadata-only thinking blocks (not surfaced
as visible response text) so callers know thinking is produced but marked
hidden; reference the ParseResponsesStreamOptions interface and the
includeThinkingInResponse property when making this change.
- Around line 137-165: The current SSE debug logs in parseResponsesStream.ts
expose raw event payloads (e.g., event.delta, event.text, JSON.stringify(event),
event.item.summary/content) which can leak PII; update the logging in the block
that references event, lastLoggedType and logger.debug so it never logs raw text
or full JSON. Instead log high-level metadata only: event.type,
event.item?.type, summary_index, content_index, and numeric lengths (e.g., delta
length, text length, encrypted_content length) or presence booleans for
sensitive fields; remove any JSON.stringify(event) and any slicing of
event.delta/event.text or direct content values and ensure reasoning-specific
branches log only these safe metadata fields. Ensure lastLoggedType handling
remains unchanged.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
packages/cli/src/settings/ephemeralSettings.tspackages/core/src/providers/openai-responses/OpenAIResponsesProvider.tspackages/core/src/providers/openai/parseResponsesStream.ts
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2026-01-13T19:28:00.789Z
Learning: In the llxprt-code codebase (`packages/core/src/providers/anthropic/AnthropicProvider.ts`), Anthropic's API returns `contentBlock.input` as an already-parsed JavaScript object, not a JSON string. The code was incorrectly calling `JSON.stringify(contentBlock.input)` before passing it to `processToolParameters()`, which was designed for OpenAI-style string parameters. This causes arrays and other complex types to be corrupted into strings (e.g., `paths` array becomes a string `"[\"**/*.toml\"]"` instead of actual array). The fix is to use `contentBlock.input` directly without stringifying for Anthropic provider.
📚 Learning: 2025-11-16T22:51:26.374Z
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
Applied to files:
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.tspackages/core/src/providers/openai/parseResponsesStream.ts
📚 Learning: 2026-01-13T19:28:00.789Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2026-01-13T19:28:00.789Z
Learning: In the llxprt-code codebase (`packages/core/src/providers/anthropic/AnthropicProvider.ts`), Anthropic's API returns `contentBlock.input` as an already-parsed JavaScript object, not a JSON string. The code was incorrectly calling `JSON.stringify(contentBlock.input)` before passing it to `processToolParameters()`, which was designed for OpenAI-style string parameters. This causes arrays and other complex types to be corrupted into strings (e.g., `paths` array becomes a string `"[\"**/*.toml\"]"` instead of actual array). The fix is to use `contentBlock.input` directly without stringifying for Anthropic provider.
Applied to files:
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.tspackages/core/src/providers/openai/parseResponsesStream.ts
📚 Learning: 2026-01-03T17:53:10.145Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2026-01-03T17:53:10.145Z
Learning: In Codex OAuth mode, the synthetic read_file tool call should always target AGENTS.md (matching CODEX_SYSTEM_PROMPT expectations), but include the actual userMemory content from whatever files are configured (LLXPRT.md, AGENTS.md, or both). The synthetic output should NOT disclose which files were actually loaded - the goal is to convince GPT it already read AGENTS.md to prevent redundant reads, regardless of the actual file configuration.
Applied to files:
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts
📚 Learning: 2025-12-14T16:16:39.939Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-14T16:16:39.939Z
Learning: MiniMax M2 model (hf:MiniMax/MiniMax-M2) requires reasoning.stripFromContext set to "none" to preserve interleaved thinking tokens in conversation history, as dropping thinking content harms agent reliability.
Applied to files:
packages/core/src/providers/openai/parseResponsesStream.ts
🧬 Code graph analysis (3)
packages/cli/src/settings/ephemeralSettings.ts (1)
packages/core/src/settings/settingsRegistry.ts (1)
resolveAlias(849-872)
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (4)
packages/core/src/providers/BaseProvider.ts (1)
NormalizedGenerateChatOptions(63-80)packages/core/src/providers/openai-responses/buildResponsesInputFromContent.ts (1)
ResponsesInputItem(27-35)packages/core/src/services/history/IContent.ts (1)
ThinkingBlock(177-194)packages/core/src/providers/openai/parseResponsesStream.ts (2)
ParseResponsesStreamOptions(80-87)parseResponsesStream(89-482)
packages/core/src/providers/openai/parseResponsesStream.ts (1)
packages/core/src/services/history/IContent.ts (2)
IContent(21-40)ContentBlock(93-99)
⏰ Context from checks skipped due to timeout of 270000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test (ubuntu-latest, 24.x)
- GitHub Check: Test (macos-latest, 24.x)
- GitHub Check: Test (windows-latest, 24.x)
- GitHub Check: E2E Test (Linux) - sandbox:none
- GitHub Check: Slow E2E - Win
- GitHub Check: E2E Test (Linux) - sandbox:docker
- GitHub Check: E2E Test (macOS)
🔇 Additional comments (12)
packages/cli/src/settings/ephemeralSettings.ts (1)
62-78: LGTM!The implementation correctly addresses the alias resolution issue from the prior review. Aliases are now resolved before validation, ensuring consistency with
parseEphemeralSettingValue. The JSDoc is clear and the logic is straightforward.packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (7)
328-408: Solid synthetic AGENTS.md injection.
Matches the Codex behavior requirement to always target AGENTS.md while returning actual userMemory without disclosing source files.Based on learnings: In Codex OAuth mode, the synthetic read_file tool call should always target AGENTS.md and should not disclose which files were actually loaded.
410-470: Provider tool stubs and model params look good.
Clear, explicit “no server tools” behavior and safe settings extraction.
560-611: Reasoning-in-context injection is clean.
Encrypted reasoning is preserved and gated behind includeReasoningInContext as intended.
705-725: Request override filtering is solid.
Skipping nested reasoning avoids double‑applying settings.
803-895: Reasoning settings + text.verbosity wiring is consistent.
Nice precedence handling and explicit request shaping.
941-944: Request logging now safe.
Keys‑only logging avoids sensitive payload exposure.
1035-1044: Stream options wiring looks good.
Passing includeThinkingInResponse down to the parser is the right hook.packages/core/src/providers/openai/parseResponsesStream.ts (4)
63-75: Nice helper extraction for reasoning deltas.
Keeps the main switch readable.
199-247: Reasoning done handlers look solid.
Hidden‑when‑disabled behavior preserves history without UI leakage.
271-354: Reasoning item handling and encrypted_content preservation are good.
Re‑emitting hidden blocks when encrypted content arrives later is a solid fix.
400-466: Fallback emission and usage handling are clean.
Good guardrails for late reasoning + usage metadata.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| // Build request input - filter out system messages for Codex (uses instructions field instead) | ||
| let requestInput = input; | ||
| if (isCodex) { | ||
| // In Codex mode, system prompt goes in instructions field, not input array | ||
| // Only filter items that have a 'role' property (function_call/function_call_output don't) | ||
| // Cast role to string to avoid TS2367 error - ResponsesInputItem union includes both role-bearing | ||
| // and non-role-bearing items, and we need to filter only those with role='system' | ||
| requestInput = requestInput.filter( | ||
| (msg) => !('role' in msg) || (msg.role as string) !== 'system', | ||
| ); | ||
|
|
||
| // @issue #966: Pre-inject synthetic tool call/result for config files (LLXPRT.md/AGENTS.md) | ||
| // This prevents the model from wasting tool calls re-reading files already injected. | ||
| // Note: We no longer inject a steering prompt - the system prompt is properly | ||
| // conveyed via the `instructions` field (see below). | ||
| const requestInputWithoutReasoning = requestInput.filter( | ||
| (item) => !('type' in item && item.type === 'reasoning'), | ||
| ); | ||
| this.injectSyntheticConfigFileRead( | ||
| requestInputWithoutReasoning, | ||
| options, | ||
| userMemory, | ||
| ); | ||
| requestInput = requestInputWithoutReasoning; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t drop reasoning input in Codex mode.
Filtering out reasoning items here prevents encrypted reasoning from being sent back in context even when includeReasoningInContext is true. That breaks round‑trip reasoning persistence. Preserve reasoning items and just prepend the synthetic call/output.
🔧 Suggested fix
- const requestInputWithoutReasoning = requestInput.filter(
- (item) => !('type' in item && item.type === 'reasoning'),
- );
- this.injectSyntheticConfigFileRead(
- requestInputWithoutReasoning,
- options,
- userMemory,
- );
- requestInput = requestInputWithoutReasoning;
+ this.injectSyntheticConfigFileRead(requestInput, options, userMemory);🤖 Prompt for AI Agents
In `@packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts`
around lines 745 - 769, The current Codex branch removes 'reasoning' items by
creating requestInputWithoutReasoning and passing that to
injectSyntheticConfigFileRead, which drops encrypted reasoning from context;
instead, keep requestInput intact (do not filter out items with
type==='reasoning') and only create a separate list for injection: compute an
itemsToInject = requestInput.filter(item => !('type' in item && item.type ===
'reasoning')) and call this.injectSyntheticConfigFileRead(itemsToInject,
options, userMemory) so the synthetic tool call/result is prepended but the
original requestInput (including reasoning items, honored when
includeReasoningInContext is true) is preserved; adjust references to
requestInputWithoutReasoning accordingly and remove the assignment that replaces
requestInput with the filtered array.
| export interface ParseResponsesStreamOptions { | ||
| /** | ||
| * Whether to emit ThinkingBlock content in the output stream. | ||
| * When false, reasoning content is still accumulated but not yielded. | ||
| * Defaults to true. | ||
| */ | ||
| includeThinkingInResponse?: boolean; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc comment doesn’t match behavior.
The option says “not yielded” when false, but the code now emits hidden thinking blocks. Update the docstring to avoid misleading callers.
✏️ Doc fix
- * Whether to emit ThinkingBlock content in the output stream.
- * When false, reasoning content is still accumulated but not yielded.
+ * Whether to emit ThinkingBlock content in the output stream.
+ * When false, reasoning content is still emitted but marked isHidden.🤖 Prompt for AI Agents
In `@packages/core/src/providers/openai/parseResponsesStream.ts` around lines 80 -
87, The docs for ParseResponsesStreamOptions.includeThinkingInResponse are
inaccurate: update the comment on includeThinkingInResponse to state the actual
behavior — when true the parser emits ThinkingBlock content normally, and when
false the parser still emits thinking content but as hidden/metadata-only
thinking blocks (not surfaced as visible response text) so callers know thinking
is produced but marked hidden; reference the ParseResponsesStreamOptions
interface and the includeThinkingInResponse property when making this change.
| // SSE event visibility for debugging reasoning support. | ||
| // We log to stderr directly so it shows up in debug logs even if | ||
| // Track last logged type to avoid duplicate logs | ||
| if (event.type !== lastLoggedType) { | ||
| lastLoggedType = event.type; | ||
| } | ||
|
|
||
| // Debug: Log ALL events with full details | ||
| logger.debug( | ||
| () => | ||
| `SSE event: type=${event.type}, delta="${event.delta?.slice(0, 50) ?? ''}", text="${event.text?.slice(0, 50) ?? ''}", item_type=${event.item?.type ?? 'none'}, summary_index=${event.summary_index ?? 'none'}, content_index=${event.content_index ?? 'none'}`, | ||
| ); | ||
| // Extra debug for any reasoning-related events | ||
| if ( | ||
| event.type.includes('reasoning') || | ||
| event.item?.type === 'reasoning' | ||
| ) { | ||
| logger.debug( | ||
| () => `REASONING SSE: ${JSON.stringify(event).slice(0, 500)}`, | ||
| ); | ||
| } | ||
|
|
||
| // Debug: Log raw reasoning items | ||
| if (event.item?.type === 'reasoning') { | ||
| logger.debug( | ||
| () => | ||
| `Reasoning item received: summary=${JSON.stringify(event.item?.summary)}, content=${JSON.stringify(event.item?.content)}, encrypted_content_length=${event.item?.encrypted_content?.length ?? 0}`, | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid logging raw SSE content (PII risk).
The debug logs include reasoning/text deltas and full reasoning event payloads. That can leak user data into logs. Prefer logging lengths/keys only.
🔒 Suggested redaction
- logger.debug(
- () =>
- `SSE event: type=${event.type}, delta="${event.delta?.slice(0, 50) ?? ''}", text="${event.text?.slice(0, 50) ?? ''}", item_type=${event.item?.type ?? 'none'}, summary_index=${event.summary_index ?? 'none'}, content_index=${event.content_index ?? 'none'}`,
- );
+ logger.debug(
+ () =>
+ `SSE event: type=${event.type}, delta_len=${event.delta?.length ?? 0}, text_len=${event.text?.length ?? 0}, item_type=${event.item?.type ?? 'none'}, summary_index=${event.summary_index ?? 'none'}, content_index=${event.content_index ?? 'none'}`,
+ );
...
- logger.debug(
- () => `REASONING SSE: ${JSON.stringify(event).slice(0, 500)}`,
- );
+ logger.debug(
+ () =>
+ `REASONING SSE: type=${event.type}, item_id=${event.item_id ?? 'none'}, encrypted_len=${event.item?.encrypted_content?.length ?? 0}`,
+ );📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // SSE event visibility for debugging reasoning support. | |
| // We log to stderr directly so it shows up in debug logs even if | |
| // Track last logged type to avoid duplicate logs | |
| if (event.type !== lastLoggedType) { | |
| lastLoggedType = event.type; | |
| } | |
| // Debug: Log ALL events with full details | |
| logger.debug( | |
| () => | |
| `SSE event: type=${event.type}, delta="${event.delta?.slice(0, 50) ?? ''}", text="${event.text?.slice(0, 50) ?? ''}", item_type=${event.item?.type ?? 'none'}, summary_index=${event.summary_index ?? 'none'}, content_index=${event.content_index ?? 'none'}`, | |
| ); | |
| // Extra debug for any reasoning-related events | |
| if ( | |
| event.type.includes('reasoning') || | |
| event.item?.type === 'reasoning' | |
| ) { | |
| logger.debug( | |
| () => `REASONING SSE: ${JSON.stringify(event).slice(0, 500)}`, | |
| ); | |
| } | |
| // Debug: Log raw reasoning items | |
| if (event.item?.type === 'reasoning') { | |
| logger.debug( | |
| () => | |
| `Reasoning item received: summary=${JSON.stringify(event.item?.summary)}, content=${JSON.stringify(event.item?.content)}, encrypted_content_length=${event.item?.encrypted_content?.length ?? 0}`, | |
| ); | |
| } | |
| // SSE event visibility for debugging reasoning support. | |
| // We log to stderr directly so it shows up in debug logs even if | |
| // Track last logged type to avoid duplicate logs | |
| if (event.type !== lastLoggedType) { | |
| lastLoggedType = event.type; | |
| } | |
| // Debug: Log ALL events with full details | |
| logger.debug( | |
| () => | |
| `SSE event: type=${event.type}, delta_len=${event.delta?.length ?? 0}, text_len=${event.text?.length ?? 0}, item_type=${event.item?.type ?? 'none'}, summary_index=${event.summary_index ?? 'none'}, content_index=${event.content_index ?? 'none'}`, | |
| ); | |
| // Extra debug for any reasoning-related events | |
| if ( | |
| event.type.includes('reasoning') || | |
| event.item?.type === 'reasoning' | |
| ) { | |
| logger.debug( | |
| () => | |
| `REASONING SSE: type=${event.type}, item_id=${event.item_id ?? 'none'}, encrypted_len=${event.item?.encrypted_content?.length ?? 0}`, | |
| ); | |
| } | |
| // Debug: Log raw reasoning items | |
| if (event.item?.type === 'reasoning') { | |
| logger.debug( | |
| () => | |
| `Reasoning item received: summary=${JSON.stringify(event.item?.summary)}, content=${JSON.stringify(event.item?.content)}, encrypted_content_length=${event.item?.encrypted_content?.length ?? 0}`, | |
| ); | |
| } |
🤖 Prompt for AI Agents
In `@packages/core/src/providers/openai/parseResponsesStream.ts` around lines 137
- 165, The current SSE debug logs in parseResponsesStream.ts expose raw event
payloads (e.g., event.delta, event.text, JSON.stringify(event),
event.item.summary/content) which can leak PII; update the logging in the block
that references event, lastLoggedType and logger.debug so it never logs raw text
or full JSON. Instead log high-level metadata only: event.type,
event.item?.type, summary_index, content_index, and numeric lengths (e.g., delta
length, text length, encrypted_content length) or presence booleans for
sensitive fields; remove any JSON.stringify(event) and any slicing of
event.delta/event.text or direct content values and ensure reasoning-specific
branches log only these safe metadata fields. Ensure lastLoggedType handling
remains unchanged.
Per OpenAI docs: 'we highly recommend you pass back any reasoning items returned with the last function call... This allows the model to continue its reasoning process to produce better results in the most token-efficient manner.' Previously, reasoning items were filtered out before the synthetic config injection and never restored. Now we: 1. Filter reasoning only for finding the injection point 2. Merge injected items back with preserved reasoning items 3. Maintain proper order: injected -> reasoning -> non-reasoning This follows the same pattern as MiniMax M2 which requires reasoning tokens in conversation history for agent reliability. Fixes CodeRabbit comment about Codex dropping reasoning input items.
Summary\n- parse Responses API reasoning SSE events into ThinkingBlocks\n- buffer reasoning deltas and emit before usage metadata\n- add unit coverage for reasoning-only, interleaved, and edge cases\n\n## Testing\n- npm run format\n- npm run lint\n- npm run typecheck\n- npm run test\n- npm run build\n- node scripts/start.js --profile-load synthetic --prompt "write me a haiku"\n\nfixes #922